Journal of Applied Psychology 


Joun G. Dartey, Editor 
University or MINNESOTA 


Lorraine Boutuitet, Managing Editor 





Table of Contents 


Partitioning and Saturation of Visual Displays and Efficiency of Visual Search: C. W. Eriksen.. 73 


Reduced Trials and Reliability of Visual-Acuity Thresholds Obtained by the Method of Con- 
stant Stimuli: J. Tiffin and G. F. Rabideau 


The Leadership Ideology of Aircraft Commanders: A. W. Halpin 


An sg go Analysis of Supervisory and Group Dimensions: R. C. Wilson, W. S. High, and 


The Relationship of Human Interest to Immediate Retention and to Acceptability of Technical 
Material: G. R. Klare, J. E. Mabry, and L. M. Gustafson 


Respondents Rate Public Opinion Interviewers: J. M. Brown 


ae Achievement, Interest, and Personality Tests: A Longitudinal Comparison: R. F. 


Predicting na 5 Achievement with the Differential Aptitude and the Primary Mental 
Abilities Tests: W. D. Wolking 


An Analysis of the Predictive Value of the Pre-Engineering Ability Test: F. Q. Sessions 


The Relationship Between Minnesota Teacher Attitude Inventory Scores and Scores on Certain 
Scales of the Minnesota Multiphasic Personality Inventory: W. W. Cook and D. M. 


Validities of Three Vocational Interest Keys for U.S. Navy Yeomen: D. K. Perry 
Note on Another Use of the Sentence-Completion Technique: N. Gekoski and E. S. Isard 
A Correction of “The Inference of Accident Liability from the Accident Record”: A. Mintz. . . 


Book Reviews 





American Psychological Association 
Volume 39, Number 2 April, 1955 





Consulting Editors 


Harold E. Burtt, Ohio State University 
Alphonse Chapanis, Johns Hopkins Univer- 
sity 


Clifford E. Jurgensen, Minneapolis Gas 
Company 


Laurence S. McGaughran, University of 
Houston 


Quinn McNemar, Stanford University 

Alexander Mintz, City College of New York 

Harold F. Rothe, Fairbanks, Morse and 
Company 

Julian B. Rotter, Ohio State University 

Donald E. Super, Columbia University 

Miles A. Tinker, University of Minnesota 





This journal gives primary consideration to origi- 
nal investigations in any field of applied psychol- 
ogy except clinical and consulting psychology, al- 
though a descriptive or theoretical article may be 
accepted if it represents a special contribution in 
an applied field. Quantitative investigations of in- 
terest or value to psychologists working in the fol- 
lowing broad fields wiil be considered: vocational 
and educational prognosis, diagnosis, and guidance 
at the secondary and college level; personnel re- 
search in business, industry, and government; bio- 
mechanics; industrial working conditions; research 
on opinion and morale factors; job analysis and 
classification research; market and advertising re- 
search. 


Because of the large number of manuscripts sub- 
mitted, authors should adhere to the rule of 


“brevity consistent with clarity.” The typical 
manuscript should run to approximately 4,000 
words. There is a lag of approximately twelve 
months between receipt and publication of an 
article. Authors may request advanced publica- 
tion if they are prepared to pay the cost of print- 
ing the necessary extra pages. 


Manuscripts should be addressed to the Editor, 
John G. Darley, 408 Johnston Hall, University of 
Minnesota, Minneapolis 14, Minnesota. All manu- 
scripts should be submitted in duplicate. Original 
figures are prepared for publication; duplicate fig- 
ures may be photographic or pencil-drawn copies. 

Manuscripts must conform to the style require- 
ments described in the “Publication Manual of the 
American Psychological Association,” Psychol. Bull., 
1952, 49, No. 4, Part 2. 





Journal of Applied Psychology 


Published bimonthly by the 
American Psychological Association 
Prince and Lemon Sts., Lancaster, Pa. 
and 1333 Sixteenth Street N.W. 
Washington 6, D. C. 


$7.00 per volume 


$1.50 per issue 


Subscriptions, orders, and business communications should be addressed to the American Psychological Association, 
1333 Sixteenth St. N.W., Washington 6, D. C. Address changes must reach the subscription office by the 25th of 
the month to take effect the following month. Undelivered copies resulting from address changes will not be replaced; 
subscribers should notify the post office that they will guarantee second-class forwarding postage. Other claims for 
undelivered copies must be made within four months of publication. 


Entered as second-class matter, August 19, 1943, at the post office at Lancaster, Pa., under the act of March 3, 1879. 
Acceptance for mailing at the special rate of postage provided for in paragraph (d-2), Section 34.40, P. L. & R. 


of 1948, authorized October 10, 1947. 


Copyright, 1955, by the American Psychological Association, Inc. 





Journal of Applied Psychology 








VoL. 39, No. 2 


APRIL, 1955 








Partitioning and Saturation of Visual Displays and 
Efficiency of Visual Search * 


Charles W. Eriksen 
The Johns Hopkins University 


The development of «ur present technol- 
ogy has associated with it, in many cases, a 
need for quick and accurate methods of pro- 
viding human operators with a wide variety 
of essential information. A frequently used 
method of meeting this need is the visual dis- 
play. Visual displays are used for a multi- 
tude of purposes and are of divers forms. 
They vary from the fairly simple graphical 
display of statistics to the highly complex 
displays used in modern control and com- 
munication centers. The present study is 
concerned with certain perceptual problems 
involved in the visual display of information. 

In using visual displays a common task 
confronting the observer is one of visual 
search flere the observer is required to scan 
the display in order to detect the presence or 
location of particular signals or other en- 
coded information. In this task the struc- 
ture or characteristics of the perceptual field 
(display) may be expected to influence his 
performance. Previous research (2), for ex- 
ample, has shown that the heterogeneity of 
the field in terms of such visual dimensions 
as hue, form, size, and brightness is a sig- 
nificant factor in search time. Search time 
in general was found to increase with increas- 
ing heterogeneity of the field. 

Besides heterogeneity on various dimen- 
sions a display or perceptual field can vary 
in terms of a number of characteristics. One 
of these is the amount of information it con- 
tains. A display may be virtually saturated 
with signals or the signals may be few in 
number and widely scattered throughout the 


1 This research was performed under Contract AF 
33(038)-—22642 between the Aero Medical Labora- 
tory, Wright Air Development Center and The Johns 
Hopkins University. 


display. Perceptually we may consider this 
variable as relating to the ratio of figure to 
ground and we might expect that the speed 
with which an observer can scan the display 
in search of specified signals would be af- 
fected by this ratio. 

Displays also vary in the natural or arti- 
ficial subdivisions or partitionings that exist 
in them. In many displays grid lines or co- 
ordinates are used to partition the display. 
While these grid lines may be helpful in 
specifying the location of a signal, their ef- 
fect upon speed and accuracy of visual search 
is unevaluated. 

The purpose of the present experiment was 
to determine the effect upon visual search 
time of three variables: saturation, partition- 
ing, and search area. Speed of visual search, 
as measured by the time it took a subject (S) 
to locate a constant number of signals in a 
display, was determined when the following 
factors were varied: (a) the-number of non- 
target signals in the display, (6) the number 
of partitionings of the display, and (c) the 
total area of search. 


Method 


Apparatus. A square display, perpendicular to S’s 
line of sight was used. The center of the display 
was 60 in. from the floor. It was painted a flat 
white and was ruled off into equal-sized partitions 
by the use of 4¢-in. black lines. A %4-in. electrode 
was located at the bottom of each of the partitions 
and a tiny hook at the top. The number of par- 
titions and the area of the display were varied in 
the experiment. A sliding panel was used to hide 
the display from S. When a catch was released, the 
panel dropped, giving S a full view of the display. 
The dropping panel also tripped a microswitch 
which in turn activated an electric timer. The S$ 
located objects in the display by touching a stylus 
to the electrode in the proper partition. 





74 C.F. 


A control panel, which provided connections with 
each of the electrodes on the face, was mounted on 
the back of the display. Ten telephone jacks could 
be plugged into these connections so as to obtain 
any desired pattern of ten electrodes. These jacks 
were connected to a system of relays, which were 
tripped whenever S touched the stylus to the corre- 
sponding electrode. When all ten relays had been 
tripped, in any order, they completed a circuit that 
stopped the timer. 

The signals placed in the display were cut out of 
a dark gray cardboard. They were of three shapes: 
triangles, squares, and diamonds. All were of a con- 
stant size, % in. on their maximum verticle and 
horizontal dimensions. Ten triangles were always 
used as the target signals, and the number of squares 
and diamonds placed in the display was varied ex- 
perimentally. 

Subjects. A total of 36 students at the Johns 
Hopkins University was used, 20 males and 16 fe- 
males. No attempt was made to control for sex 
differences since a previous study involving a simi- 
lar task (1) had shown no evidence of sex difference 
on this type of perceptual-motor task. 

Design. The three experimental variables were 
(a) display saturation or the number of objects 
placed in the display, (b) partitioning of the dis- 
play by the use of grid lines, (c) the display area. 
For the saturation variable, S was required to lo- 
cate the ten target objects when the display con- 
tained a total of 20, 40, 60, and 80 signals. Par- 
titioning was varied by ruling the display off into a 
9X9, a 13 X 13, and a 16 X 16 matrix by the use 
of Yg¢-in. wide lines. 

There are two ways in which we can vary the 
number of partitions in the display. One is to hold 
constant the area of the display as we vary the num- 
ber of partitions. In this case the size of the indi- 
vidual partitions will vary. The other method is to 
hold constant the size of the individua! partitions, 


Eriksen 


but allow the total area of the display to vary. In 
the present study both methods of varying the num- 
ber of partitions were used. This was accomplished 
by using three different display areas, 18, 24, and 
32 in. square. Thus it was possible to compare dif- 
ferences between number of partitions when a con- 
stant-sized display was used and also when the size 
of the partitions in the display was kept constant. 

The three experimental variables were arranged in 
a 3X4X3 factorial design. Twelve Ss were as- 
signed at random to each of the three display sizes. 
Each S was tested once under each of the 12 com- 
binations of the three different partitionings and 
four saturations. Within each display size, the or- 
der in which Ss were tested on the various com- 
binations of saturations and partitioning was coun- 
terbalanced in a latin-square design. 

Subjects were run individually. They were in- 
structed to locate the ten triangles in the display as 
rapidly as possible by touching the stylus to the 
electrode in the partition where the triangles were 
placed. In order to eliminate one source of error, 
Ss were instructed to count the triangles so as to 
be sure that they had located all ten. The S’s score 
was the time required to locate the ten triangles.? 
Search time was measured to Yoo sec. All Ss had 
received extensive practice on a similar task prior to 
beginning the experimental trials. 

There were always ten target signals in the dis- 
play. They were located among the partitions in 
the display according to 12 patterns. These pat- 
terns had been selected to meet the following cri- 
teria: (a) No two target signals were to be in ad- 
jacent partitions; and (b) at least one target signal 
was to occur in the top two rows of the display 
and at least one in the bottom row. Within dis- 
play sizes, the 12 patterns were varied among the 
12 combinations of the different numbers of par- 


2 Although errors were recorded they were so very 
rare that no analysis of them was merited. 


Table 1 


Summary of the Analysis of Variance 








Source 


Total 

Display saturation 

Display partitions 

Display area 

Subjects 

Saturation X partitions 

Saturation X area 

Partition X area 

Saturation X partitions X area 

Within-subjects error terms 
Pooled subjects by saturation 
Pooled subjects by partitions 
Pooled subjects by saturations 

by partitions 

Composite within-subject error 





* Significance beyond the .05 level. 
** Significance beyond the .001 level. 





Sum of 


Squares 


9,684.1 
4,267.5 
579.7 
588.5 
2,042.7 
91.4 
47.5 
14.1 
40.5 


1422.5/12.1 = 117.2** 
289.9/16.7 = 17.3** 
294.2/61.9 = 

61.9/ 

15.2/8.5 

7.9/12.1 

3.5/16.7 : 

3.4/8.5 = 


1,201.7 
1,102.8 


12.1/ 
16.7/ 


1,688.5 
3,992.9 


8.5/ 
1 1 .0/ 





Visual Displays 75 


Table 2 


Mean Search Time in Seconds for Different Satura- 
tions, Partitions, and Display Area 


Saturation (number of objects 
. s ay 
Matrix m4 display) 


20 40 60 80 


18-in. Square Display 


~ 10.82 
13.56 
14.34 


16.55 
18.98 
19.95 


~ 13.06 
15.83 
16.98 


9x9 
a. a2 
16 X 16 


8.62 
9.60 
9.88 


24-in. Square Display 
11.57 13.94 


11.86 15.91 
12.71 16.05 


16.20 
18.06 
20.94 


19.88 
20.55 
22.80 


9x9 
13 X 13 
16 X 16 


32-in. Square Display 

13.79 
16.44 
18.20 


9x9 
13 XK 13 
16 X 16 


9.79 
10.71 
11.58 


12.98 
13.16 
13.96 


15.79 
18.44 
19.49 


titions and saturations in a latin square different 
from that used for counterbalancing Ss. No S ob- 
tained the same pattern twice. 


Results 


The search times obtained under the vari- 
ous treatment conditions were subjected to 
a modified four-way analysis of variance 
(saturation, partitions, display size, and in- 
dividuals). A summary, of this analysis is 
shown in Table 1. 

Two of the experimental variables, satura- 
tion and partitioning, were significant beyond 
the .001 level of confidence and the third 
variable, display area, was significant at the 
.05 level. However, none of the interactions 
between these three variables approached sig- 
nificance. Table 2 presents the mean search 
times for the various combinations of satura- 
tions and partitions for each display area. 
Increasing the saturation or the number of 
signals in the display leads to increase in 
search time. Also the more the display is 
partitioned off by the use of grid lines, the 

3 The method of analysis is identical with that 
described by Lindquist (4) under Plan VI. The 
search times were examined for evidence of skewed 
distributions, but they appeared from examination 
to be relatively normal. In view of this and also 
taking into consideration the report (5) that moder- 
ate departures from normality have little effect upon 
significance levels, it was considered unnecessary to 
use a logarithmic or reciprocal transformation on 
the time scores. Bartlett’s test when applied to the 


data offered no basis for rejecting the null hypothe- 
sis that the variances were homogeneous. 


longer is the time required for locating the 
target signals. 

These effects can be more readily perceived 
graphically. In Fig. 1 search time has been 
plotted as a function of display saturations 
and number of partitions. Search time has 
been averaged through display areas. As is 
apparent, search time increases monotonically 
as the number of nontarget signals in the dis- 
play increases. In fact, the function would 
appear to be linear, but the points are too 
few to be confident of the exact shape of the 
function. It can also be seen from this graph 
that search time is less when the display has 
the minimum number of partitions. The 
9x9 partitioning results in the quickest 
search for all saturations and the 16 < 16 the 
slowest. This latter effect is further illus- 
trated in Fig. 2 where search time has been 
plotted as a function of the number of par- 
titions in the display and the display area. 

From Fig. 2 it is apparent that the rela- 
tion between search time and total search 
area is not a monotonic one. The slowest 
location is obtained with the 24-in. square 
display. On the other hand, location is 
nearly as rapid with the 32-in. square dis- 
play as it is with the 18 in. 

As was pointed out above, when a display 
is partitioned into a greater number of areas, 
the size of the individual partitions or cells 

22 , a 


o- -—-0 
. 


256 PARTITIONS 
169 PARTITIONS 
6) PARTITIONS 


r 


o 
r 


t IN SECONDS 
+ 








40 60 
NUMBER OF SIGNALS IN DISPLAY 
(INCLUDING 10 TARGET SIGNALS) 


Fic. 1. Search time as a function of display 
saturation and partitioning. Search time has been 
averaged through the three display areas and each 
point is the mean for 36 Ss. 





C. W. Eriksen 


X-——-— 32 IN SO. DISPLAY 
O-----0O 24 IN. SO. DISPLAY 


sf 
1@ IN. SG. DISPLAY ~“ 
/ 


t IN SECONDS 


ry 








V NUMBER OF PARTITIONS 


Fic. 2. Search time as a function of display par- 
titioning and area. Search time has been averaged 
through saturations and each point is the mean of 
48 scores on each of 12 Ss. 


is decreased. It is conceivable, therefore, 
that the differences in search time obtained 
with differing numbers of partitions may be 
due to increasing difficulty in identifying or 
discriminating a signal in a smaller partition. 
As a check on this possibility, the three dis- 
play sizes were so chosen that the partitions 
resulting from the 9 X 9 partitioning of the 
smallest display were the same size as the 
cells resulting from the 13 x 13 partitioning 
of the middle-sized display and the 16 x 16 
partitioning of the largest display. The fail- 
ure to obtain a significant interaction be- 
tween the three experimental variables indi- 
cates that the effects upon search time due 
to partitioning cannot be ascribed to the 
varying size of the individual partitions or 
cells in the display. 

This can be clearly seen from Fig. 2. If 
the effect upon search time due to the num- 
ber of partitions was solely the result of the 
smaller size of the cells, jt would be expected 
that the search time for the 9 x 9 partition- 
ing of the 18-in. matrix, the 13 x 13 par- 
titioning of the 24-in. matrix, and the 16 Xx 
16 partitioning of the 32-in. matrix would 
be approximately equal except for a slight 
difference resulting from differences in to- 
tal search area. Comparison of these three 
points in Fig. 2 shows that increases in search 


time associated with increases in the number 
of partitions are much greater than can be 
attributed to differences in the total search 
area or size of the display. 


Discussion 


The results can be briefly summarized as 
follows: (a) As the number of signals or ob- 
jects in the display was increased, the time 
required for the visual location of a constant 
number of specific signals also increased. 
(6) When the visual field or display was par- 
titioned into 81, 169, and 256 equal sectors 
by the use of grid lines, the efficiency of 
visual search decreased with the increase in 
the number of partitions. (c) As the size of 
the display or search area was varied, visual 
search was most efficient when the search 
area was 18 and 32 in. square and least ef- 
fective when 24 in. square. 

The first of these findings agrees quite well 
with our intuitive expectations. It seems 
reasonable that it would be more difficult to 
locate a particular object or objects when 
the number of objects among which they were 
interspersed was increased. In the present 
case it would appear that this effect can be 
explained or understood in terms of the num- 
ber of visual fixations required for the ac- 
complishment of the task. Here the tri- 
angles were easily discriminable from the 
squares and diamonds as indicated by the 
fact that Ss rarely made an error by touching 
the stylus to the electrode beneath an incor- 
rect signal. While peripheral vision is ade- 
quate for detecting the presence of a signal 
against the display field, it would appear that 
foveal vision was used or required to make 
the discrimination between target and non- 
target signals.* Detection by peripheral vi- 
sion leads to a brief foveal fixation while the 
signal is identified as a target or nontarget 
signal. In the simplest case where only 
target signals are present in the display, only 
as many fixations as there are target signals 
are required. However, increasing the num- 
ber of nontarget signals in the display leads 
to more visual fixetions. Every signal picked 
up in peripheral vision leads to a foveal fixa- 
tion. Thus if search time is primarily de- 


4While good form discrimination can be obtained 
in peripheral vision by training, it must be remem- 


bered that the present Ss were untrained. It also 
seems likely that even with training, if the task 
permits, foveal vision will be used for discrimina- 
tions in preference to peripheral cues. 





Visual Displays 


pendent upon the time for fixation, increas- 
ing the number of objects in the display 
would be expected to lead to a linear in- 
crease in search time. Such a linear relation- 
ship was suggested by the obtained data. 

Explanations for the effects obtained by in- 
creasing the number of partitions of the dis- 
play are not readily apparent. The increase 
in search time with increases in the number 
of partitions has been shown in the present 
data to be due to the effect of the number of 
partitions per se and not to the absolute size 
of the separate divisions. It is possible that 
the presence of grid lines in the display has 
a distracting effect upon visual search. French 
(3) has shown that irrelevant signals in a 
display impede the recognition of target sig- 
nals. He has termed this effect “visual noise” 
on the analogy of noise in an auditory sys- 
tem. However, labeling such effects as visual 
noise does not contribute to an understand- 
ing of the effects. 

A likely explanation for the present data 
can be obtained if we examine how Ss ap- 
proach the task of visual search and what use 
they make of grid lines. Grid lines in a dis- 
play not only break the display up into cells 


and partitions; they also organize the display 


into columns and rows. This organization of 
the display may be expected to affect S’s 
method of search. It would appear that Ss 
scan a display by either columns or rows 
when these are present in the display. Since 
increasing the number of partitions in a dis- 
play results in an increase in the number of 
columns or rows that S must scan, a linear 
relation might be expected between the num- 
ber of partitions and the speed of search. 
The data in Fig. 2 do suggest such a linear 
relationship. 

The findings in the present study with re- 
spect to search area may well be an artifact 
of the method the Ss were required to use to 
indicate that they had located a target signal. 
The slower search obtained for the 24-in. 
square display, as compared with the 18- and 
32-in. square displays, may reflect a change 
in the perceptual motor task. It was ob- 
served that Ss tended to use eye and head 
movements to scan the 18- and 24-in. dis- 
plays, meanwhile holding the stylus in a fixed 
position. With the 32-in. display, Ss tended 
to move the stylus back and forth across the 


rows of the display in synchrony with their 
visual scanning. When the stylus is held in 
a fixed position, increasing the area of the 
display required longer movements in order 
to touch the electrode beneath the signal. 
However, when the area of the display be- 
comes great enough, the shift over to a motor 
scanning again reduces the extent of the re- 
quired movements once a target signal has 
been identified. Such an interpretation sug- 
gests that visual search, independent of the 
motor response necessary to signify location, 
is largely independent of the display areas 
used in the presént experiment. 


Summary 


The present study was concerned with the 
effect of several characteristics of visual dis- 
plays upon speed of visual search. The time 
required to locate a constant number of sig- 
nals in a square display was determined 
when: (a) the number of irrelevant signals 
was varied from 10 to 70 and (0) the num- 
ber of partitions of the display was varied by 
the use of grid lines. Grid lines were used 
to partition the display into a 9 X 9, a 13 X 
13, and a 16 X 16 matrix. 

The results show that search time increases 
both when the number of irrelevant signals 
is increased and when the number of par- 
titions is increased. An explanation was ad- 
vanced for these effects in terms of the num- 
ber of foveal fixations required for signal 
identification and the use that observers 
make of grid lines in their plan of search. 


Received April 23, 1954. 


References 


1. Eriksen, C. W. Location of objects in a visual 
display as a function of the number of di- 
mensions on which the objects differ. J. exp. 
Psychol., 1952, 44, 56-60. 

. Eriksen, C. W. Object location in a complex 
perceptual field. J. exp. Psychol., 1953, 45, 
126-132. 

. French, R. S. Pattern recognition in the pres- 
ence of visual noise. J. exp. Psychol., 1954, 
47, 27-32. 

. Lindquist, E. F. Design and analysis of experi- 
ments in psychology and education. Boston: 
Houghton Mifflin, 1953. 

. Norton, D. W. An empirical investigation of 
some effects of non-normality and _ hetero- 
geneity on the F-distribution. Unpublished 
doctor’s dissertation, State Univer. of Iowa, 
1952. 





The Journal of Applied Psychology 
Vol. 39, No. 2, 1955 


Reduced Trials and Reliability of Visual-Acuity Thresholds 
Obtained by the Method of Constant Stimuli 


Joseph Tiffin and Gerald F. Rabideau 


Occupational Research Center, Purdue University 


In recent years the psychophysical method 
of constant stimuli has been applied to nu- 
merous psychotechnological problems. Wendt 
(13) reported its use in the determination of 
the effect of floor tile size on the apparent 
brightness of patterns of such tile and a simi- 
lar study of golf ball cover markings on the 
balls’ apparent size. 

Ferguson (3) adapted the constant process 
to the item-selection method based on the 
method of constant stimuli. Harrison and 
Harrison (8) reported a constant-process 
modification which they developed in cereal 
research. Application of the method of con- 
stant stimuli to the photographic problems of 
determining the relationships between cer- 
tain physical variables and psychological 
variables involved in print quality was de- 
scribed by Jones (9). 

One factor which has tended to retard wide- 
spread practical application of the constant 
process is the extensive time which is neces- 
sary to obtain data involving numerous stimu- 
lus presentations and extensive tallying and 
computations. 

An experiment was recently conducted at 
the Occupational Research Center labora- 
tories to determine what effect the reduction 
in length of trial series has upon the retest 
reliabilities of absolute thresholds for visual 
acuity. Measurement of visual acuity was 
done by means of the psychophysical method 
of constant stimuli and the computational 
Miiller-Urban curve-fitting method. 

There has been a dearth of work on this 
problem. Classical psychophysical investi- 
gators prior to this century agreed that 500 
to 700 judgments, or trials, were necessary to 
the establishment of meaningful thresholds 
by the method of constant stimuli. As late 
as 1910 there had been no systematic in- 
vestigation of the problem. Whipple (14), 
in his manual, suggested the use of as few as 
ten judgments per stimulus. Fernberger (4) 


78 


advocated in 1914 the elimination of the ex- 
treme stimuli and reduction of the number 
of comparison stimuli, therefore, from seven 
to five, for purposes of determining differ- 
ence thresholds. He found that this reduc- 
tion cut experimental time by “nearly one 
third,” and that the thresholds and error 
measures were not markedly affected. 

Fernberger (5) subsequently investigated 
the smallest number of trials which yielded 
a satisfactorily reliable threshold, concluding 
that “we suggest, on the basis of our results, 
that 50 judgments upon each of five compari- 
son pairs is a sufficient number to require 
from each subject for ordinary anthropo- 
metric study” (p. 271). Boring (1) dis- 
agreed, asserting that it is impossible to limit 
arbitrarily the number of observations, and 
that as few as ten observations per stimulus 
might be warranted wherever the error meas- 
ures of such stimuli were sufficiently small 
to warrant their statistical comparison. ‘The 
required number varies with the use that is 
to be made of the limen,” he wrote. “A 
limen has scientific value only as it may be 
compared with other limens” (p. 315). 

The above controversy continued for a 
time, with Fernberger (6) reiterating his con- 
clusions and defining his stand. It does ap- 
pear that Boring favored more than 50 trials 
per stimulus. He stated (2, p. 291) that 
“one does not ordinarily wish to take less 
than 100 series” (100 x 5 stimulus values 
equals 500 trials). 

These early approaches all involved con- 
sideration of estimates of absolute reliability. 
Another controversy during this century has 
been over the correctness of various formulas 
suggested to estimate that reliability. 

In 1928 Newhall (11) concluded from an 
investigation of computational methods that 
150 judgments yielded thresholds of the same 
order and variability, when determined by 
means of normal interpolation, as did the 





Visual-Acuity Thresholds 79 


Urban constant method with 450 judgments. 
He recommended the former method of curve 
fitting on the basis of its labor-saving virtues. 
Newhall published the first study of relative 
reliability measures in this area; all his co- 
efficients were in excess of .94. 

Linder (10), in an investigation of psycho- 
physical methods, substantiated Newhall with 
respect to the practical superiority of inter- 
polation methods when trials and number of 
stimulus intensities were both reduced and 
reliability was the criterion. When larger 
numbers of trials and classical numbers of 
stimuli were used, there were no significant 
differences between either of the 
methods. 

Linder recommended that greatest effi- 
ciency in curve fitting could be achieved by 
concentrating most of one’s observations on 
approximately two stimulus values subtend- 
ing the approximate thresholds, and by ap- 
plying interpolation curve-fitting methods. 

Tiffin and Rabideau (12), using a constant 
method of curve fitting, obtained test-retest 
reliabilities of .94 to .96 with 50-trial series 
(250 trials per threshold) and 13 subjects 
(Ss). The present paper is an immediate 


above 


outgrowth of that investigation, and it pro- 
poses to distribute trials over five stimulus 
values and systematically reduce the number 
of trials utilized in computation of the thresh- 
old, with retest reliability as a criterion of 
practicality. 


Materials and Procedure 


The target involved in this experiment was the 
Landolt ring, a nonletter target which has had ex- 
tensive experimental application to problems of 
visual acuity. The Landolt ring consists of a broken 
circle, utilizing both minimum separable and mini- 
mum visible elements as its critical measures. In 
the current experiment it was randomly presented 
in each of four positions; the opening was placed 
at the top, right, bottom, and left, from S’s view- 
point. 

Sizes of Landolt rings were graded by a decimal 
acuity series, a geometric progression of steps, in 
which the decimal notation, d, may be defined as: 


where a equals the visual angle in minutes sub- 
tended by the unit design of the target. In the 
case of the Landolt ring, the gap and stroke width 


are the unit design, and the outside diameter of the 
ring subtends five times that visual angle. 

The target was illuminated by means of a single 
incandescent source, with the target brightness being 
approximately ten foot-lamberts. A rotating device 
was used to hold the target and to change the po- 
sition of its gap. An occluder was operated by the 
experimenter (E£) to cut off S’s view of the target 
between trials. The optical distance was the equiva- 
lent of 34.5 feet, with S viewing the target by means 
of a mirror. This situation, made necessary by room 
dimensions, placed S and E adjacent to each other. 
They were separated by a screen. General illumina- 
tion in the experimental room was approximately 
four foot-candles. This prevented the Ss’ eyes from 
becoming completely dark adapted. Normal heat- 
ing and ventilating conditions were present. All ex- 
perimental sessions were conducted between 8 A.M. 
and 5 p.m. 

Fifteen volunteer Ss were used, each one being 
individually tested. They were Caucasians of both 
sexes, 18 to 25 years old. Both graduate and un- 
dergraduate college students were represented in the 
sample. Both eyes were used by Ss in making 
their judgments, and hence in determination of the 
thresholds. Those Ss who normally wore corrective 
lenses did so during the experiment. 

Each S was seated in the observer's position, and 
was orally instructed regarding the discrimination 
that he was to make and how he was to respond. 
The S was told to guess whenever he was uncertain 
whether the ring’s opening was at the top, right, 
bottom, or left of the configuration. A change in 
response for any trial was allowed at any time prior 
to the presentation of the following trial. 

The E presented a limited number of practice 
trials with varied sizes of targets in order to deter- 
mine the size range which was to be used with the 
method of constant stimuli. The largest target was 
so selected that S failed to correctly respond to it 
100 per cent of the time, and the smallest so that S 
responded correctly at a greater than chance per- 
centage. 

In the actual test, two separate series, called “Data 
A” and “Data B,”’ were presented in succession, 
separated by a five-minute rest interval. Each se- 
ries consisted of the same five target sizes, which 
were each separately presented to S for 50 trials, 
with the targets being presented in descending order 
of size. 

The E recorded whether each response was cor- 
rect or incorrect. At the end of the trials for each 
target size, E informed S of the number of correct 
judgments which he had made. This was done to 
facilitate maintenance of S’s motivation. 

Intertrial intervals were not timed. The E estab- 
lished a regular pacing with each S on the basis of 
his speed of response. 

The first computational step was the calculation 
of percentages of correct responses in the first 5, 10, 
20, 30, 40, and all 50 trials per stimulus size for 
each S. This was separately done with both Data 
A and Data B. Next, a correction for chance suc- 





J. Tiffin and G. F. Rabideau 


Table 1 
Visual-Acuity Thresholds for 15 Ss Computed from First 5, First 10, First 20, First 30, First 40, and 50 Trials 








Trial n’s 





50 40 











Data Data 











A B A B 


B 





1.02 1.07 1.01 1.05 
1.16 1.23 1.14 1.18 
1.38 1.43 1.40 1.39 
1.22 1.18 1.21 1.14 
1.37 1.41 1.34 1.32 
1.08 1.10 1.07 1.10 
1.36 1.39 135 1.39 
0.62 0.62 0.62 0.65 
1.08 1.18 1.10 1.18 
1.12 0.98 1.12 1.01 
0.83 0.79 0.83 0.79 
L423. 433 1.11 1.35 
0.86 0.80 9.89 0.78 
1.13 1.17 1.13 1.15 
1.26 1.27 1.22 1.31 


Means 5.2%. 3.43 £10 1.12 1.10 


1.00 
1.24 
1.42 
1.14 
1.34 
1.08 
1.40 
0.57 
1.16 
0.98 
0.80 
1.02 
0.76 
1.13 
1.25 


1.12 1.09 1.06 1.16 





* These thresholds incalculable because of wide divergence 


cesses was applied to each P. The correction for 
guessing formula used was: 


with P. being the corrected per cent correct for any 
stimulus, and P, the uncorrected per cent. 

Guilford (7) outlines the computations involved 
in the Miiller-Urban constant method. This pro- 
cedure was followed to obtain thresholds, precision 
measures (4), and standard deviations for each S, 
for both Data A and B, and for trial m’s (per 
threshold) of 25, 50, 100, 150, 200, and 250. 

Pearson product-moment correlations were com- 
puted between thresholds of Data A and B, for the 
respective trial m’s, in order to obtain retest re- 
liability coefficients for those trial m sizes. 


Results 


Visual acuity thresholds for all Ss for both 
Data A and B and all trial m’s are shown in 
Table 1. Mean thresholds for the trial n 
sizes are reported at the bottom of that table. 

Table 2 contains retest reliability coeffi- 
cients for the trial m size thresholds. It is 
noticeable that these coefficients remain rela- 
tively constant with decreases in trial m size 


of P’s from normal ogive. 


until less than 10 trials per stimulus are in- 
volved in computation of thresholds. 

All the reliability coefficients in Table 2 
are significantly greater than zero at the 1 
per cent confidence level, with the exception 
that for thresholds based on a trial m size of 
5, or on only 25 trials per threshold, the re- 


liability was only .35. That coefficient fails 
to be statistically significant at the 5 per cent 
level. The divergence of that coefficient from 
the general trend established by the remain- 


Table 2 


Test-Retest Reliability Coefficients for Thresholds 
Obtained from Various-Sized Trial m’s Ranging 
from 5 to 50 Trials Per Stimulus Value 








Trial n 
Size 
50 
40 
30 
20 
10 
5 











Visual-Acuity Thresholds 81 


ing reliability estimates may be due, in part, 
to the fact that three Ss’ thresholds could not 
be determined at the 5-trial m level because 
of the wide divergence of the p values from 
a normal ogive function. This may be a re- 
sult of a greater fluctuation in correctness of 
responses during the first few judgments Ss 
made on presentation of each different stimu- 
lus size. The cause of the phenomenon is not 
explainable by means of these data, however. 

The relatively high reliability of as few as 
10 trials per stimulus (or 50 per threshold) 
indicates the desirability, in this instance, of 
substituting a short series for the more time- 
consuming longer one. 


Summary 


A psychophysical investigation was made 
to find what effect a reduction in trials per 
stimulus has upon reliability of visual-acuity 
. thresholds computed by the method of con- 
stant stimuli and Miiller-Urban curve-fitting 
method. A goal was to find a practical mini- 
mum in terms of number of trials. 

Visual-acuity thresholds were determined 
for 15 college students, who were each sub- 
jected to two successive measurement series 
of stimuli. Thresholds were computed, using 
five stimulus values and 5, 10, 20, 30, 40, and 
50 presentations per stimulus. The target 
used was the Landolt ring. 

Retest reliabilities of .92 and higher were 
obtained for as few as 10 trials per stimulus 
(50 per threshold). 

The following conclusions are warranted: 


1. Insofar as this kind of psychophysical 
measurement may be typical of much previ- 
ously published work, classical psychophysi- 
cal investigators probably tended to utilize 
more judgments than are necessary for com- 
putation of reliable thresholds. 

2. The experiment suggests that, for pur- 
poses of estimating absolute thresholds, the 
Miiller-Urban constant method with five 
stimulus sizes appears to yield satisfactorily 
reliable thresholds for purposes of practical 
applications, when as few as 10 trials per 
stimulus and 4-choice response categories are 
used. This conclusion, of course, applies 


only under the conditions of the present ex- 
periment. These conditions were: (a) vision 
was binocular; (0) acuity design was the 
Landolt ring; (c) illumination in the experi- 
mental room was approximately four foot- 
candles; (d) brightness of the targets was 
approximately ten foot-lamberts; and (e) the 
research considered only absolute, not differ- 
ential, thresholds. 


3. Further experimentation is advisable be- 
fore the findings are generalized beyond the 
experimental cunditions listed above. 


Received March 24, 1954. 


References 


1. Boring, E.G. The number of observations upon 
which a limen may be based. Amer. J. Psy- 
chol., 1916, 27, 315-320. 

. Boring, E. G. Urban’s tables and the method 
of constant stimuli. Amer. J. Psychol., 1917, 
28, 280-293. 

. Ferguson, G. A. Item selection by the constant 
process. Psychometrika, 1942, 7, 19-29. 

. Fernberger, S. W. On the elimination of the 
two extreme intensities of the comparison 
stimuli in the method of constant stimuli. 
Psychol. Rev., 1914, 21, 335-355. 

. Fernberger, S. W. The effects of practice in its 
initial stages in lifted weights experiments and 
its bearing on anthropometric measurements. 
Amer. J. Psychol., 1916, 27, 261-272. 

. Fernberger, S. W. Concerning the number of 
observations necessary for the determination 
of a limen. Psychol. Bull., 1917, 14, 110-113. 

. Guilford, J. P. Psychometric methods. New 
York: McGraw-Hill, 1936. 

. Harrison, S., & Harrison, Margaret J. A psy- 
chophysical method employing a modification 
of the Miiller-Urban weights. Psychol. Bull., 
1951, 48, 249-256. 

. Jones, L. A. Psychophysics and photography. 
J. opt. Soc. Amer., 1944, 34, 66-68. 

. Linder, F. E. A statistical comparison of psy- 
chophysical methods. Psychol. Monogr., 1933, 
44, No. 3 (Whole No. 199), 1-20. 

. Newhall, S. M. An interpolation procedure for 
calculating thresholds. Psychol. Rev., 1928, 
35, 46-66. 

. Tiffin, J., & Rabideau, G. F. Harrison & Har- 
rison’s modification of the Miiller-Urban 
weights. Psychol. Bull., 1953, 50, 474-476. 

. Wendt, G. R. Two industrial applications of a 
psychophysical method. J. appl. Psychol., 
1932, 16, 269-276. 

. Whipple, G. M. Manual of mental and physi- 
cal tests. Baltimore: Warwick & York, 1910. 





The Journal of Applied Psychology 
Vol. 39, No. 2, 1955 


The Leadership Ideology of Aircraft Commanders ° 


Andrew W. Halpin 


Personnel Research Board, The Ohio State University 


This is an analysis of the relationship be- 
tween aircraft commanders’ ideology of how 
they should behave as leaders and their 
crews’ perception of how they do behave on 
two dimensions of leader behavior: Initiat- 
ing Structure and Consideration. Halpin and 
Winer (9), in describing the development of 
the Air Force adaptation of a Leader Behav- 
ior Description Questionnaire devised origi- 
nally by Hemphill and Coons (11), have 
defined these two dimensions. Initiating 
Structure denotes the commander’s behavior 
in organizing and defining the relationship 
between himself and the members of the 
crew; in establishing clear patterns of organi- 
zation, channels of communication, and ways 
of getting the job done. This corresponds 
essentially to Homans’ concept of the leader 
“originating interaction” (12). Considera- 
tion refers to behavior indicative of friend- 
ship, mutual trust and respect, and good “hu- 
man relations” between the commander and 
his crew. These two empirically identified 
dimensions of leader behavior are congruent 
with the two group objectives defined by 
Cartwright and Zander: 


It appears that most, or perhaps all, group ob- 
jectives can be subsumed under one of two head- 
ings: (a) the achievement of some specific group 
goal, or (b) the maintenance or strengthening of the 
group itself. Examples of member -behaviors that 
serve functions of goal achievement are “initiates 
action,” “keeps members’ attention on the goal,” 
“clarifies the issue,” “develops a procedural plan,” 
“evaluates the quality of work done,” and “makes 
expert information available.’ Examples of behav- 
iors that serve functions of group maintenance are 
“keeps interpersonal relations pleasant,” “arbitrates 


1This research was supported in part by the 
United States Air Force under Contract No. AF 
18(600)-1051, monitored originally by the Human 
Factors Operations Research Laboratories, and more 
recently by the Director, Crew Research Laboratory, 
Air Force Personnel and Training Research Center, 
Randolph Air Force Base, Texas. Permission is 
granted for reproduction, translation, publication, 
use, and disposal in whole and in part by or for the 
United States Government. The author is especially 
indebted to Miss Florence D. Bennis for her con- 
scientious assistance in analyzing the data. 


disputes,” “provides encouragement,” “gives the mi- 
nority a chance to be heard,” “stimulates self-direc- 
tion,” and “increases the interdependence among 
members.” Any given behavior in a group may 
have significance both for goal achievement and 
maintenance. Both may be served simultaneously 
by the actions of a member, or one may be served 
at the expense of the other (2, p. 541). 

Fleishman (3, 4, 5, 6) and Harris (10), in 
turn, have modified the Air Force adaptation 
of the Leader Behavior Description Question- 
naire for use in their studies of factory fore- 
men, and have found these two leader be- 
havior dimensions equally relevant in an 
industrial setting. Halpin (8) has reported 
the relationship between the aircraft com- 
mander’s behavior on these dimensions and 
evaluations of his performance made both by 
his superiors and his crew members; and 
elsewhere (7) has presented evidence which 
indicates that the most “effective” command- 
ers are those who score high on both dimen- 
sions of leader behavior. This means, in 
Barnard’s terms (1), that these commanders 
are effective in that they accomplish the ob- 
jectives of the crew; and efficient in that they 
satisfy the motives of individual crew mem- 
bers. 

The present study deals with two specific 
questions: (a) How do aircraft commanders 
believe they should behave in respect to the 
Initiating Structure and the Consideration 
dimensions of leader behavior? (5b) How 
much agreement is there between the com- 
manders’ ideology of how they should behave 
as leaders and their crews’ perception of how 
they do behave on these two dimensions? 


Procedure 


The subjects were 132 B-29 and B-50 aircraft 
commanders. These aircraft are essentially similar, 
with an 1l-man crew on the B-29 and a 10-man 
crew on the B-50. Seventy-six of the commanders 
were studied in the Far East Air Force at the time 
they were flying combat missions over Korea.2, The 


2 These data were collected in Japan during the 
summer of 1951 by Dr. John K. Hemphill and Lt. 
Colonel Fred Holdrege, as members of a Human 
Resources Research Laboratories research team. 





Leadership Ideology 


other 56 commanders were members of select Stra- 
tegic Air Command crews undergoing evaluation in 
this country. As measured in this study, the two 
groups of commanders did not differ significantly 
either in leadership ideology or leader behavior; 
hence both groups have been combined into a single 
sample. 

The 132 commanders were described on the Leader 
Behavior Description Questionnaire (LBDQ-Real) 
by 1,103 members of their crews. On the average, 
*8.3 descriptions (¢ = 1.6) were secured for each com- 
mander. The Leader Behavior Description Ques- 
tionnaire is in multiple-choice format, and contains 
80 items. The crew members indicated the fre- 
quency with which the commander engaged in the 
behavior described by marking, for each item, a 
statement containing one of five adverbs: always, 
often, occasionally, seldom, or never. The following 
items are illustrative: 


Initiating Structure 


He makes his attitudes clear to the crew. 
He speaks in a manner not to be questioned. 
He maintains definite standards of performance. 


Consideration 


He is easy to understand. 

He does little things to make it pleasant to be a 
member of the crew. 

He gets crew approval on important matters 
before going ahead. 


Each of the two dimensions contains 15 items. 
Each item is scored from 0 to 4; hence the theoreti- 
cal range of scores for each dimension is from 0 to 
60. The estimated split-half reliabilities of the keys 
are .83 and .92, respectively. 

The 132 commanders answered the Leader Be- 
havior Description Questionnaire, but with different 
instructions. They indicated how they believed they 
should behave as leaders. This form of the ques- 
tionnaire will be referred to as the LBDQ-Ideal, in 
contrast to the LBDQ-Real that was answered by 
the crew members. For the LBDQ-Ideal the esti- 
mated split-half reliabilities are .69 for the Initiat- 
ing Structure scores, and .66 for the Consideration 
scores. 

The 1,235 completed questionnaires were scored 
on the Initiating Structure and the Consideration 
dimensions. For the LBDQ-Ideal, each command- 
er’s scores were used to represent his ideology in 
respect to the two leader behavior dimensions. In 
the case of the LBDQ-Real, answered by the crew 
members, it was necessary first to determine for 
each dimension separately whether the between- 
crew variance was significantly greater than the 
within-crew variance. Both F ratios were signifi- 
cant at the .01 level. The corresponding unbiased 
correlation ratios are .44 for Initiating Structure, 
and .61 for Consideration, indicating that crew mem- 
bers tend to agree in describing their commander’s 
leader behavior. Accordingly, crew mean Initiating 
Structure scores and crew mean Consideration scores 


Table 1 


Means, Standard Deviations, F Test of Homogeneity of 
Variance, and ¢ Ratios for Difference Between 
Means for Initiating Structure and Consid- 
eration Scores on the Leader Behavior 
Description Questionnaire—Ideal 
and Real (V = 132) 
Initiating 
Structure 


Consideration 


Mean o 


Mean oc 


51.0 4.6 48.7 5.3 

40.9 4.9 39.7 8.0 

1.13 2.28* 
11.69** 


LBDQ—Ideal 

LBDQ—Real 
PF 

t 18.50** 





* Significant at the .01 level of confidence. 
** Significant at the .001 level of confidence. 


were used as indices of the commander’s behavior on 
these two dimensions. An analysis was made of the 
relationship between the LBDQ-Real and the LBDQ- 
Ideal scores. 


Results 


Table 1 presents the means, standard devia- 
tions, tests of homogeneity of variance, and ¢ 
ratios for the difference between the means of 
the Initiating Structure and Consideration 
scores on the LBDQ-Ideal as compared with 
the LBDQ-Real. 

The correlation between the two dimension 
scores on the LBDQ-Ideal is .29; on the 
LBDQ-Real, .45.° The correlation between 
the LBDQ-Ideal and LBDQ-Real Initiating 
Structure scores is .14, which fails of signifi- 
cance. Between the paired Consideration 
scores, the correlation is .17, which barely 
achieves significance at the .05 level. 

If the aircraft commanders’ leader behav- 
ior as perceived by their own crew members 
is used as a base line, then the commanders’ 
statement of how they should behave in Initi- 
ating Structure represents an ideal 2.1 stand- 
ard deviation above the level at which they 
are perceived as functioning in this respect. 
Similarly, the commanders’ statement of how 
they should behave in regard to Considera- 
tion constitutes an ideal 1.1 standard devia- 
tion above the level at which they are per- 


8 Both correlations are significant at the .01 level. 
The latter figure is consistent with a correlation of 
45 found between these same dimensions for an in- 
dependent sample of aircraft commanders (9). 





84 A, W. Halpin 


ceived to behave. With a correlation of .45 
between the two leader behavior dimensions 
on the LBDQ-Real, the probability that any 
commander will be described at or above 
both the Initiating Structure and the Con- 
sideration mean values expressed by the com- 
manders on the LBDQ-Ideal is approximately 
009 (13, p. 91). 


Summary 


In expressing their leadership ideology, the 
aircraft commanders clearly recognize the de- 
sirability of scoring high on both dimensions 
of leader behavior.* This is consistent with 
the finding reported elsewhere (7) that the 
most “effective” commanders are those who 
do score high on both dimensions. But the 
relationship between the commanders’ state- 
ment of how they should behave and their 
behavior as described by their crews is negli- 
gible. In the case of the Initiating Structure 
dimension, the correlation does not differ sig- 
nificantly from zero; and although the corre- 
sponding correlation of .17 for the Considera- 
tion dimension is significant at the .05 level 
of confidence, this represents only a low de- 
gree of association. The moderate reliability 
of the Initiating Structure and Consideration 
scales and the fact that the distributions are 
not entirely normal probably contribute to 
the low magnitude of the correlations ob- 
tained. Nevertheless, the evidence suggests 
that on the whole the aircraft commander’s 
knowledge of how he should behave as a 
leader has little bearing upon how he is per- 
ceived as behaving by the members of his 
crew. 

Obviously, how the leader believes he should 
behave constitutes a “fact” of a kind. There 
is no intention to imply that the leader’s be- 
liefs on this score are unimportant, or that 
they are unrelated to other aspects of his be- 
havior and the behavior of the group. They 


4 The difference between the Ideal and the Real is 
more pronounced on Initiating Structure than on 
Consideration; but the variance among commanders 
in their Consideration, as perceived by their crews, 
is significantly greater than the variance in their own 
statements of how they should behave in respect to 
Consideration. 


may be; but this remains to be demonstrated. 
The only point at issue here is that his beliefs 
about his leadership behavior are not highly 
associated with his behavior as his crew per- 
ceives it. On the basis of these findings, 
those engaged in leadership training programs 
should be especially wary about accepting 
trainees’ statements of how they should be- 
have as evidence of any parallel changes in 
their behavior. 


Received April 9, 1954. 


References 


. Barnard, C. I. 
Cambridge, 
1938. 

. Cartwright, D., & Zander, A. 
dynamics: research and 
Ill.: Row, Peterson, 1953. 

. Fleishman, E. A. Leadership climate and su- 
pervisory behavior. Columbus, O.: Personnel 
Res. Bd., Ohio State Univer., 1951. 

. Fleishman, E. A. Leadership climate, human 
relations training and supervisory behavior. 
Personnel Psychol., 1953, 6, 205-222. 

. Fleishman, E. A. The description of supervisory 
behavior. J. appl. Psychol., 1953, 37, 1-6. 

. Fleishman, E. A. The measurement of leader- 
ship attitudes in industry. J. appl. Psychol., 
1953, 37, 153-158. 

. Halpin, A. W. Studies in aircrew composition: 
Study No. 3—The combat leader behavior of 
B-29 aircraft commanders. USAF, Hum. Fact. 
Operat. Res. Lab., Memo, 1953, No. TN-54-7. 

. Halpin, A. W. The leadership behavior and 
combat performance of airplane commanders. 
J. abnorm. soc. Psychol., 1954, 49, 19-22. 

. Halpin, A. W., & Winer, B. J. The leadership 
behavior of the airplane commander. Co- 
lumbus, O.: Ohio State Univer. Res. Found., 
1952. (Tech. Rep. III prepared for Hum. 
Resour. Res. Lab., Dept. of the Air Force 
under Contracts AF 33(038)-10105 & AF 
18(600)-27.) 

. Harris, E. F. Measuring industrial leadership 
and its implications for training supervisors. 
Unpublished doctor’s dissertation, Ohio State 
Univer., 1952. 

. Hemphill, J. K., & Coons, A. E. Leader be- 
havior description. Columbus, O.: Personnel 
Res. Bd., Ohio State Univer., 1950. 

. Homans, G. C. The human group. New York: 
Harcourt, Brace, 1950. 

. Pearson, K. Tables for statisticians and biomet- 
ricians. Part II. London: Cambridge Uni- 
ver. Press, 1931. 


The functions of the executive. 
Mass.: Harvard Univer. Press, 


(Eds.) 
theory. 


Group 


Evanston, 





The Journal of Applied Psychology 
Vol. 39, No. 2, 1955 


An Iterative Analysis of Supervisory and Group Dimensions 


Robert C. Wilson and Wallace S. High 


University of Southern California 


and Andrew L. Comrey 


University of California, Los Angeles 


A series of studies designed to discover 
variables related to organizational effective- 
ness has been conducted at the University of 
Southern California (1, 2, 9).* Question- 
naires have been administered to individuals 
within organizational work units and the data 
analyzed against criterion measures of work 
unit effectiveness to determine if the indi- 
viduals in the “more effective’ work units 
answer the questions differently from those 
in the “less effective” work units. 

The questionnaires contain groups of homo- 
geneous items or “dimensions” developed for 
the purpose of assessing characteristics of 
organizations hypothesized to have some 
relationship to their effective operation. A 
previous article (10) reports the results of a 
factor analysis of some dimensions of super- 
visory and group behavior. In that study 
seven factors were obtained. 

In order to weed out those items correlat- 
ing highly with more than one factor and to 
reduce the number of items necessary to 
cover the same material in future studies of 
supervisory and group performance, a modi- 
fied Wherry-Gaylord iterative item analysis 
was undertaken (7). The present article de- 
scribes the application of this method and 
presents the final derived list of relatively 
independent item pools with their known in- 
ternal characteristics. It is hoped that these 
will be of value to other investigators study- 
ing supervisory behavior, group attitudes, and 
employee morale. 


1 This research was carried out under Contract 
N6 onr 238-15 between the University of Southern 
California and the Office of Naval Research. The 
opinions expressed are our own and are not neces- 
sarily shared by the Office of Naval Research. The 
project is directed by J. M. Pfiffner, with J. P. 
Guilford and H. J. Locke as associate responsible 
investigators. R. C. Wilson is now with Reed Col- 
lege, Portland, Oregon. 


Procedure 


A 108-item questionnaire was administered at the 
Long Beach Naval Shipyard to 100 civilian journey- 
men or skilled tradesmen who work on all phases of 
ship overhaul, repair, and construction for the U. S. 
Navy. The questionnaire contained 13 dimensions 
or homogeneous groups of multiple-choice items.” 
The dimensions were designed to measure perceived 
characteristics of the relationship between the re- 
spondents and their supervisors such as: Participa- 
tion, Lack of Arbitrariness, Nonapprehension of Au- 
thority, Being Informed, Feedback, Attitude Toward 
Safety Enforcement, and Social Nearness; as well as 
such attitudes and interactions among members of 
the respondents’ work group as: Pride in Work 
Group, Absence of Dissension, Friendly Group At- 
mosphere, Group Cohesion, Intensity of Informal 
Control, and Lack of Informal Pressures to Restrict 
Production. 

The measuring instrument for each dimension was 
comprised of six or eight items put in the following 
objective form: 


If some worker gets too “eager,” employees 
put pressure on him to make him quit 
working so hard: 

. always 

. usually 

. sometimes 

. rarely 

. never 


The five response categories were arranged on a 
continuum from a response which expressed infre- 
quency or very little of the variable in question, in 
this case Lack of Informal Pressures to Restrict 
Production, to a response which expressed frequency 
or a great deal of the variable in question. The re- 
sponses were weighted from one to five, with a 
weight of “one” assigned to a low degree of the 
variable and “five” to the high-degree response. 

To facilitate computation of reliability estimates 
and to provide more variables for the analysis, the 
2 The complete set of questionnaire items has been 
deposited with the American -Documentation Insti- 
tute. Order Document No. 4116 from ADI Aux- 
iliary Publications Project, Photoduplication Service, 
Library of Congress, Washington 25, D. C., remit- 
ting in advance $1.75 for microfilm or $2.50 for 
photocopies. Make checks payable to Chief, Photo- 
duplication Service, Library of Congress. 





R. C. Wilson, W. S. High, and A. L. Comrey 


Table 1 


Data on the Sample 








Mean 


41.6 





Age 


Years in shipyard 4.3 
Years in civil service 6.3 
Highest school grade completed 


10.9 


six or eight items for each dimension were separated 
into comparable halves or subdimensions of 3 or 4 
items. An individual’s total score for each sub- 
dimension was obtained by adding the weights as- 
signed to his responses for the items in that par- 
ticular subdimension. 

Factor analysis of the subdimensions, using Thur- 
stone’s complete centroid method (6), yielded seven 
identifiable factors. These were named: Supervisor- 
Subordinate Rapport, Congenial Work Group, In- 
formal Control, Group Unity, Attitude Toward 
Safety Enforcement, Social Nearness, and Pride in 
Work Group. Since the ultimate purpose was to 
cover the same ground with a smaller number of 
items, these factors were used as a starting point for 
the iterative analysis. Instead of beginning the iter- 
ative analysis with a total score based upon all 
items in the inventory, scores were derived for the 
the seven obtained centroid factors by summing the 
weights for all items contained in subdimensions 
with loadings greater than .40 on that factor. This 
procedure has the advantage of greatly» reducing the 
number of iterations required. Wherry, Perloff, and 
Campbell (8) present more fully the arguments fa- 
voring this procedure. 


The Iterative Method 


While the authors of the iterative pro- 
cedure (7, 8) have described the method in 
detail, a description of the present applica- 
tion is in order since a few modifications 
were made. The computations were begun 
by dichotomizing each item as close to the 
median score as possible. 


1. Each individual’s responses were re- 
corded on IBM answer sheets. 

2. Factor scores were derived by summing 
the weights (1, 2, 3, 4, or 5) assigned to the 
responses for all items in subdimensions hav- 
ing loadings greater than .40 on a factor in 
the previous analysis (10). 

3. According to Flanagan’s _ short-cut 
method for computing biserial correlations 
(4), the answer sheets were divided into five 
groups of 9, 20, 42, 20, and 9 per cent on the 


basis of total score derived in step 2 above. 
Biserial correlations between all items in the 
inventory and the total item-pool score were 
computed using Flanagan’s tables (3). 

4. Steps 2 and 3 were repeated for the six 
remaining item pools corresponding to the 
centroid factors. 

5. A second item pool was derived for each 
factor by selecting those items having high 
correlations with the first total factor score 
and rejecting those items with low correla- 
tions. It should be noted that this is a weak 
point in the method since there is no clear- 
cut objective criterion by which to retain or 
reject items for the first iteration. Our pro- 
cedure was to keep those items whose item- 
pool correlation was about .60 or greater. 
However, an item which correlated .60 or 
greater with the pool was sometimes rejected 
because (a) there was a gap in the distrfbu- 
tion of rp, between the given item and a 
group of items with higher item-pool correla- 
tions, or (4) the item correlated as highly or 
higher with another item pool.* In the latter 
case the items were rejected to avoid overlap 
of items among the several pools. 

6. New total scores were derived for each 
of the second item pools, summing the weights 
of responses for items in the revised pools. 

7. Using the new total scores obtained in 
step 6, steps 3 and 5 were repeated with 
items being added or withdrawn from the 
pools as their correlations increased or de- 
creased (subject to the conditions noted un- 
der step 5). The entire process was re- 
peated until the item pools were stabilized, 
i.e., until there were no large changes in item- 
pool correlations. This point was reached for 
most item pools after two or three iterations. 

8. It was found that one pool contained 
more than twice as many items as other pools. 
Examination revealed that most of the items 
fell into one of two content areas, Lack of 
Arbitrariness and Communication. It was 
decided to iterate these two subpools further 
to determine if a useful separation could be 
made between them. Accordingly, the items 
were divided into two groups on the basis of 
their apparent content and steps 3 to 7 were 
repeated. After two iterations the subpools 


3 This procedure was suggested by Dr. Wherry in 
a personal communication to the authors. 





Supervisory and Group Dimensions * $7 


were stabilized and their intercorrelation 
computed. The resulting correlation of .56 
was regarded as being low enough to warrant 
maintaining the two item pools separately 
for future use. 

9. The final item-total biserial correlations 
were converted to point-biserial correlations 
and corrected for spurious part-whole corre- 
lation with the aid of Guilford’s abac (5). 
The corrected point-biserial correlations were 
then changed back to biserial correlations. 
These correlations, herein considered as fac- 
tor loadings, are reported in Table 2. 

10. Wherry (7) considers the correlation 
of an item with its pool as corresponding to 
a centroid factor loading. He recommends 
plotting the pairs of item pools as would be 
done in an oblique centroid factor analysis 
and then transforming the structure to or- 
thogonality by means of a procedure devel- 
oped by Doolittle. Rotation of the ortho- 
gonal structure is then recommended to 
achieve a more meaningful solution. This 
procedure was not followed in the present 
case because the authors felt that the un- 
rotated item pools were adequate for our 
purposes. Table 3 shows the intercorrela- 
tions among the resulting item pools and 


Table 4 shows their split-half reliabilities, 
means, and standard deviations. 


Results 


Results of the analysis are given in Table 
2. The table presents the item stems and 
item-pool correlations. In order to conserve 
space only the two extremes of the response 
continua are shown. The pool to which each 
item in Table 5 belongs is identified by the 
roman numeral preceding the item. The cor- 
responding roman numerals and item pool 
names are given in the following discussion. 
The supervisory dimensions will be given first, 
followed by discussion of group dimensions. 

Supervisory dimensions. Lack of Arbitrari- 
ness (I) is the strongest pool on the basis of 
item-total correlations. This pool refers to a 
supervisor’s willingness to accept contrary 
opinions from his subordinates and tendency 
to refrain from expressions of hostility when 
his employees express views different from 
his own. 


Communication (II) reflects the extent to 
which the supervisor passes on company in- 
formation to his employees and keeps them 
informed as to how they stand in his graces. 
The supervisor who converses freely and fre- 
quently with his employees on the job can 
be expected to be rated highly on this test. 

Safety Enforcement (III) measures sub- 
ordinates’ opinion of the supervisors’ concern 
over the safety practices of his men. The 
supervisor rated highly on this pool of items 
is one who promotes the strict observance of 
safety rules by setting a personal example 
and by bringing to the attention of his men 
infractions of safety regulations. 

Social Nearness (IV) reflects the frequency 
of the off-the-job social contacts of the su- 
pervisor with his subordinates and the ex- 
tent to which he enters into personal friend- 
ships with them. This group of items is es- 
sentially a rating of the supervisors’ soci- 
ability with respect to his subordinates. 

Group dimensions. The Absence of Dis- 
sension (V) pool is represented by a group 
of items related to the congeniality of the 
work group. These items are all phrased so 
as to elicit opinions about the extent of quar- 
reling and bickering between individuals and 
groups of people within the work unit. The 
response continua were arranged so that a 
high weight reflected the absence rather than 
the presence of unharmonious relations among 
the employees. Consequently, this dimen- 
sion does not necessarily reflect the presence 
of cordial relations among members of the 
work unit, but merely the absence of mani- 
fest expressions of hostility. 

Informal Control (VI) is a measure of the 
influence of workers outside the formally 
delegated hierarchy of authority. A_ high 
score on this pool suggests that leadership 
in a work unit is arrogated by one or more 
of the forceful workers when the supervisor 
does not exercise vigorous leadership. 

Group Unity (VII) is essentially a meas- 
ure of morale. It reflects the tendency of 
people in the work unit to stick together as 
a group and to rely upon collective action of 
the unit to obtain benefits for all. The em- 
phasis in this dimension is upon an identifi- 
cation with the group and the feeling that the 





R. C. Wilson, W. S. High, and A. L. Comrey 


Table 2 


Correlations of Items with Item Pools 











Items I II III IV 





Supervisor Dimensions 





If you or another worker offer him a goed idea 
he will use it if possible: 1. rarely . . . 5. always 77 38 
He is willing to listen to your ideas: 1. never 
... 5. always 8&3 50 
He hates to have employees disagree with him: 
1. always... 5. never 8&8 
He gets sore if employees question his orders: 
1. always ...'5. never 8&5 
He puts into practice good suggestions of em- 
ployees: 1. never... 5. always 80 
He thinks the employees have no right to ques- 
tion his actions: 1. always... 5. never 82 38 
He seems willing to pass on nonsecret informa- 
tion he gets from higher up: 1. never... 5. 
always 
You know where you stand with him: 1. never 

.. 5. always 
He keeps to himself information the employees 
‘would like to have: 1. always... 5. never 
His employees could do a better job if he would 
let them know more about what was going on: 
1. definitely .. . 5. not at all 
Things would run more smoothly if he let em- 
ployees in on more information from higher up: 
1. very much... 5. not much 
He passes on interesting bits of information he 
gets from the front office: 1. never . . . 5. very 
frequently 
He tries to see that safety rules are observed: 
1. almost never... 5. very frequently 
For not observing good safety practices on the 
job you would be penalized in some way by 
him: 1. not at all... 5. definitely 
He observes the safety rules himself: 1. almost 
never... 5. always 
When a major safety rule is broken he is con- 
cerned: 1. not much ... 5. very much 
He tells you about it when you break a minor 
safety rule as well as the major ones: 1. never 

. 5. always 

He has had parties for groups of employees at 
his home: 1. never . . . 5. frequently 
You see him socially after working hours: 
never... 5. frequently 
He has close friends among his employees: 1. 
none ... 5. four or more 








Supervisory and Group Dimensions 


Table 2—Continued 





Items I II Il 


Supervisor Dimensions—Continued 





You know him personally: 1. not at all... 
5. very much 0S —10 —12 





He has invited you to his home: 1. never... 

5. frequently 18 —08 
He “goes out” with the boys in the unit: 1. 

never... 5. frequently —05 —23 
He associates with his employees during off 

hours: 1. not at all... 5. very much 03 

He avoids making close friends among his em- 

ployees: 1. very much... 5. not at all 35 





Group Dimensions 


People in your unit are continually mad at 
each other: 1. very much... 5. not at all 40 10 





There are people in your unit who refuse to 
speak to each other: 1. several .. . 5. none ; 32 
There are people in your unit who seem to have 
it in for somebody else: 1. several... 5. none 
Certain groups of people in your unit have it in 
for each other: 1. very much... 5. not at all 
There is bad feeling between groups of people in 
your unit: 1. very much... 5. not at all 
Employees who try too hard to get ahead are 
disliked by the workers in your unit: 1. very 
much ... 5. not at all 
When men in the unit get angry at each other, 
they hold a grudge: 1. almost always... 5. 
never 
There are people in your unit who are hard to 
get along with: 1. many... 5. none 
People in your unit quarrel with each other: 
1. very much... 5. not at all 
V-10. There is bickering and fighting among employ- 
ees in your unit: 1. very much... 5. not at all 
VI-1. The opinions of popular workers in your unit 
have power over the other employees: 1. not 
at all... 5. very much 
There are certain workers in your unit besides 
the boss, who seem to lead the others: 1. not 
at all. . . 5. very much 
Some workers in your unit, besides the boss, 
control the actions of other employees: 1. not 
at all... 5. very much 
What you do and say is influenced by the opin- 
ions of your fellow employees: 1. not at all 
. 5. very much 
The word of the “old timers” in your unit 
carries weight with the other workers: 1. not 
at all... 5. very much 
Your unit is a friendly place to work: 1. not 
at all... 5. very much 








R. C. Wilson, W. S. High, and A. L. Comrey 


Table 2—Continued 





Items 


I 


II 





Group Dimensions—Contlinued 





VII-S. 


VIII-1. 


VIIT-2. 


People in your unit act as a group to get things 


they want: 1. never... 5. very frequently 
People in your unit seem to “hang together” 
like a team: 1. not at all... 5. very much 
If the workers in your unit were angry at some- 
thing the supervisor did, they would take action 
as a group to let him know how they felt: 1. not 
atall... 5. definitely 
The morale in your unit is: 1. below average 
. 5. outstanding 
You care whether or not the people in your unit 
do a good job: 1. not at all... 5. very much 
It makes you mad when someone in your unit 
doesn’t try to do a good job: 1. 
always 


never... 5. 


You are proud of the work record of your unit: 
1. not at all... 5. very much 

Other units you know about do a better job 
than yours: 1. very much... 5. not at all 
You have told friends that your unit is a good 
outfit: 1. never... 5. very frequently 


50 


39 


00 


26 


34 


31 





Note.—The roman numerals in the first column identify the item pool to which the corresponding item belongs. 
cussion of any given item pool may be found in the text by reference to these numerals. 
within each pool. 
continua actually used in the inventory are reported. 


All items are listed in the second column. 


coefficients for the corresponding item with each of the eight item pools. 


24 


40 


21 


40 13 


39 37 14 14 14 37 





The dis- 
The items are numbered consecutively 
In order to conserve space only the two extremes of the response 
Entries in the columns beneath the roman numerals are biserial correlation 


Decimal points have been omitted. The italicized entry 


in each row is the correlation of the corresponding item with the item pool, or dimension, to which it belongs and has been corrected 


for part-whole correlation. 


unit serves the interests of all its members. 

The Pride in Work Group (VIII) test is 
the weakest of the eight from the point of 
view of reliability, and for this reason its fur- 
ther use in the present form cannot be rec- 
ommended. The items in this pool were in- 
tended to reflect a feeling of pride in the 


Table 3 


group as a production unit. Items VIII-3 
and VIII-5 most nearly approximate what 
this dimension was intended to measure. In 
a study as yet unpublished, this dimension 
was revised with the remaining items omitted 
and two new items more similar to VIII-3 
and VIII-5 added. These were: “You would 
rather work with your present unit than with 
any other: 1. definitely not . . . 5. very defi- 


nitely;” and, “As compared with other units 
in the company, yours is: 1. below average 
... 5. outstanding.” The reliability esti- 
mate of the revised Pride in Work Group di- 
mension was .75 for the new sample. 


Item-Group Intercorrelations 





IV V VI VII VIII 





II III 





Summary 


1. Eight relatively independent and homo- 
geneous groups of items pertaining to super- 
visory practices and group interactions have 
been presented. The supervisory dimensions 








Supervisory and Group Dimensions 


Table 4 


Means, Standard Deviations, and Reliability Estimates* 


Statistic Ill 
Mean 20.66 
o 5.32 5.5. 3.03 
Te .67 





* Split-half correlations corrected for double length. 


are Lack of Arbitrariness, Communication, 
Safety Enforcement, and Social Nearness. 
The group dimensions are Congenial Work 
Group, Informal Control, Group Unity, and 
Pride in Work Group. 

2. These item groups were derived using a 
modification of the Wherry-Gaylord iterative 
analysis procedure. 

3. These dimensions will be useful in vali- 
dation studies and morale surveys where in- 
dependent and homogeneous item groups with 
known reliabilities and intercorrelations are 
needed. These dimensions may be supple- 
mented with additional item groups to meet 
the needs of particular experimental situa- 
tions. 


Received April 5, 1954. 


References 


1. Comrey, A. L., Pfiffner, J. 
P. Factors influencing organizational effec- 
tiveness. I. The U. S. forest survey. Per- 
sonnel Psychol., 1952, 5, 307-328. 


M., & Beem, Helen 


of Item Groups 


1V 
16.20 
5.35 
76 


. Comrey, A. L., Pfiffner, J. M., & Beem, Helen 
P. Factors influencing organizational effec- 
tiveness. II. The department of employment 
survey. Personnel Psychol., 1953, 6, 65-79. 

. Flanagan, J. C. A table for obtaining the bise- 
rial correlation coefficient. Pittsburgh: Ameri- 
can Institute for Research, 1950. 

. Flanagan, J. C. The effectiveness of short 
methods for calculating correlation coeffi- 
cients. Psychol. Bull., 1952, 49, 342-348. 

. Guilford, J. P. The correlation of an item with 
a composite of the remaining items in a test. 
Educ. psychol. Measmt, 1953, 13, 87-93. 

. Thurstone, L. L. Multiple-factor analysis. 
cago: Univer. of Chicago Press, 1947. 

. Wherry, R. J., & Gaylord, R. H. The concept 
of test and item reliability in relation to fac- 
tor pattern. Psychometrika, 1943, 8, 247-269. 

. Wherry, R. J., Perloff, R., & Campbell, J. T. 
An empirical verification of the Wherry-Gay- 
lord iterative factor analysis procedure. Psy- 
chometrika, 1951, 16, 67-74. 

. Wilson, R. C., Beem, Helen P., & Comrey, A. L. 
Factors influencing organizational effectiveness. 
III. A survey of skilled tradesmen. Personnel 
Psychol., 1953, 6, 313-325. 

. Wilson, R. C., High, W. S., Beem, Helen P., & 
Comrey, A. L. A factor-analytic study of 
supervisory and group behavior. J. appl. 
Psychol., 1954, 38, 89-92. 


Chi- 





The Journal of Applied Psychology 
Vol. 39, No. 2, 1955 


The Relationship of Human Interest to Immediate Reten- 
tion and to Acceptability of Technical Material * 


George R. Klare, James E. Mabry, and Levarl M. Gustafson 


University of Illinois 2 


This study is the third in a series on the 
relationship of various communication vari- 
ables to the learning of technical training ma- 
terial. “Human interest,” as used here, has 
been defined in terms of those aspects of 
writing measured by Flesch’s human interest 
formula (1). These are, briefly, “personal 
words” (e.g., words referring to people) and 
“personal sentences” (e.g., quoted sentences 
and sentences grammatically directed to the 
reader). It has been: predicted that written 
material that is high in human interest, as 
contrasted with material that is low, should 
be more acceptable because it appears more 
personal. Increased acceptability has been 
thought to be found in at least one study 
(2). Increased comprehension (immediate re- 
tention) has also been inferred from an ex- 
pected relationship between the readability 
and acceptability of material. 

The present writers felt that, if content 
and style difficulty were held constant in the 
material used, human interest would be un- 
likely to produce a change in immediate re- 
tention. Further, the writers felt that in- 
creased acceptability may or may not be 
found here, inclining toward the latter al- 
ternative because of the traditionally imper- 
sonal nature of technical writing. This study 

1This research was supported in part by the 
United States Air Force under Contract No. AF 
33(038)-25726, monitored by the Personnel and 
Training Research Center. Permission is granted 
for reproduction, translation, publication, use, and 
disposal in whole and in part by or for the United 
States Government. 

2Dr. Klare is now at Ohio University, Dr. Mabry 
at VA Hospital, Salt Lake City, Utah, and Dr. 
Gustafson at Oklahoma A & M College. 

8’ For a more complete presentation of this and 
other studies in the series, see reference 4. Grateful 
acknowledgment is made of the contributions of the 
following persons to these studies: (a) Drs. L. H. 
Lanier and L. M. Stolurow, University of Illinois; 
(6) E. K. Farson, F. E. Gregory, J. H. Orr, T. E 
Carter, C. S. Hershey, and others, of Chanute Air 
Force Base, Illinois; and (c) Dr. D. Dubois and 


Lt. M. Grunzke, Sampson Air Force Base, New 
York. 


was an attempt to relate human interest to 
these two dependent variables and also to 
amount read in a given time. 


Method 


A 1,206-word multilith-printed lesson from an air- 
craft mechanics training course at Chanute Air Force 
Base was used in this study. It consisted of a first 
half on the “induction system” and a second half on 
the “cooling system” of an aircraft engine. 

Two levels of human interest were used, the levels 
varying in (a) percentage of “personal words” and 
(b) percentage of “personal sentences.” The “high” 
level received a score of 28 (“interesting”) using the 
Flesch formula, while the “low” level received a 
score of zero (“dull”). 

Four versions of the lesson were used in the study. 
Two were termed “unsplit,” since both halves of 
the lesson were at the same level of human interest; 
these were called HIC (both halves were at the high 
level) and Present (the standard version of the les- 
son used in the course; both halves were at the low 
level). Two versions were termed “split,” since the 
two halves differed in level; these were called HIA 
(first half high, second half low) and HIB (the re- 
verse of these treatments). 

In preparing these versions, an attempt was made 
to change only human interest. Technical experts, 
serving as judges, determined that neither content 
nor technical terms were changed. In order to avoid 
varying the structure of the material other than in 
human interest (particularly in style difficulty), the 
HI presentation was allowed to become 77 words 
longer than Present (a check on style difficulty using 
two readability formulas showed no change). All 
versions had the same format. 

Every fifth line of each version was numbered to 
help in getting a measure of amount read by the 
subjects (Ss). Each S made a tally mark for each 
complete reading during the experimental period, 
and indicated the line he was reading when asked 
to stop. 

A 50-item multiple-choice test was used to meas- 
ure immediate retention. The items were selected 
from a pool of 112 items in such a way that each 
paragraph of the reading material was allotted a 
percentage of items proportional to its size. The 
split-half reliability of the test was .87. 

Acceptability of the presentations was determined 
by the answers to three questions. The first asked 
if Ss noticed a difference in the way the two halves 
of the lesson were written (they were asked not to 





Human Interest and Technical Material 


answer in terms of content). Those who noticed 
a difference were asked which half of the material 
they thought “easier to read” (question 2) and 
which half “more pleasant to read” (question 3). 
Those who had the unsplit versions as well as those 
who had the split versions were given these ques- 
tions. 

Subjects and procedure. A total of 440 male air- 
men in indoctrination training at Sampson Air Force 
Base were used as Ss (110 Ss for all versions). 
Aptitude indices, based on the Airman Classification 
Test Battery, were available as stanine scores for 
all Ss. 

The Ss were given 20 minutes to read the lesson 
material, during which period they indicated amount 
read as previously described. They then answered 
the three questions designed to determine accepta- 
bility of the presentations. Following this, 40 min- 
utes were allowed for answering the test. 


Results 


Amount read (20 minutes). Analyses of 
these data showed that the HIC version 
(high level of human interest) resulted in 
about 3 per cent more lines and about 3 per 
cent more words read than the Present (low 
level) version. The differences, however, 
were not significant. 

Acceptability. The three questions (“did 
the halves differ,” “which half easier to 
read,” and “which half more pleasant to 
read”) were next analyzed to determine the 
acceptability of the presentations. Despite 
instructions to the contrary, it was found that 
Ss had answered on the basis of content as 
well as manner of presentation. This was 
shown by the fact that the “cooling system” 
material (second half of the lesson) was 
judged easier and more pleasant to read than 
the “induction system” material (first half) 
whichever level of human interest (high or 
low) was used. 

However, the split versions permitted 
equating for content, and they were then 
analyzed, using the formula for the signifi- 
cance of the difference of percentages. Table 
1 presents the percentages of Ss preferring 
each half of the lesson, with arrows indicat- 
ing percentages totaled and averaged. The 
reason for this procedure was to obtain the 
percentage for the total lesson at one human 
interest level to compare to the percentage 
for the total lesson at the other human inter- 
est level. As Table 1 shows, the low level 


Table 1 


Percentages of Ss Who Indicated One Half of Lesson 
Easier and One Half More Pleasant, Based on 
Comparisons of Two “Split” Versions 





Percentage 
Indicating Indicating 
Each Half Each Half 
Easier More Pleasant 
Ist 2nd 1st 2nd 


Version* Half Half Half Half 


25 7 


a 
\\ 
\ 


a 
ri 
“ 


35 65 


Percentage 





CR = 1.49, 


* HIA version = first half, high human interest; second half, 
low. HIB version = first half, low human interest; second 
half, high. 


was preferred to the high, but the differences 
did not quite reach significance. Tetrachoric 
correlation coefficients between judgments of 
“easier to read” and “more pleasant to read” 
correlated .82 to .94. 

A further analysis was then made of the 
comparative percentage data, in an attempt 
to determine the consistency of the prefer- 
ences. This involved comparing the percent- 
age preferring the first half of one version to 
the percentage preferring the first half of an- 
other version. For example, if the percent- 
age preferring the first half of HIA (first 
half high, second half low) were to be com- 
pared to the percentage preferring the first 
half of HIB (first half low, second half 
high), the /atter percentage would be ex- 
pected to be greater because the low level of 
human interest is preferred to the high level. 
If, similarly, the first half of HIB were to be 
compared to the first half of Present (both 
halves low), the former percentage would be 
expected to be greater, since a more definitive 
comparison is afforded the subject. 

For both the second (“which half easier’’) 
and third (“which half more pleasant”) ques- 
tions, agreement with the above expectation 
was perfect (5 out of 5), yielding a value for 
each of 2.33, .03 > p > .02 (using the sign 
test). These analyses indicate that Ss were 
consistent in judging the low level, or im- 





94 G. R. Klare, J. E. Mabry, and L. M. Gustafson 


personal treatment, easier and more pleasant 
to read than the high level, or personal treat- 
ment. 

Immediate retention test score. Analysis 
of the scores on the 50-item test showed that 
the mean score for Present was less than 1 
per cent higher than the mean score for HIC. 
This difference was not significant. The mean 
mechanical aptitude (stanine) score for the 
Present Ss, however, was higher than that for 
the HIC Ss (the values were 5.13 and 4.73, 
respectively), and it had previously been 
found that aptitude score correlated highly 
with immediate retention test score. Apti- 
tude score-test score correlations computed 
on these data were .75 for the Ss who had the 
Present version and .68 for those who had 
the HIC version. 

Because of the importance of aptitude dif- 
ferences, comparable samples of 100 Present 
and HIC Ss, stratified on the basis of apti- 
tude scores, were pulled for analysis. A ¢ 
ratio computed on immediate retention test 
scores then yielded a value of .81, 50> 
> .10, showing no significant difference in 
means. 

Two 2 X 2 analyses of variance were also 
computed, using test scores of stratified sam- 
ples of 50 Ss who had the split versions. The 
first analysis involved the use of scores on 
the first half of the test (which was based on 
the first half of the lesson material); the 
second analysis involved scores on the second 
half of the test. Table 2 presents the analy- 
sis of first-half scores, and shows that the 
variance attributable to human interest is 
again not significant. A similar result was 
obtained from the analysis of second-half 
scores. 


Table 2 


Analysis of Variance Based upon First-Half Test Scores 
of Stratified Samples of 50 Ss (Present, 
HIA, HIB, and HIC Versions) 








Source of 
Variation 


Mean 
df Square F 
Human interest 21.78 1 21.78 1.39, p>.05 
Position 7.22 1 7.22 46, p>.05 
Interaction 32 1 32 02, p>.05 
Within 3,077.56 196 15.70 


Sum of 
Squares 





Discussion 


Before carrying out the study reported in 
this paper, a pilot study using the same ex- 
perimental materials was carried on in the 
classes of airmen in advanced technical train- 
ing at Chanute Air Force Base. The results 
of the pilot study were similar to those of 
this study in showing no significant differ- 
ence in immediate retention test scores for 
the high and low levels of human interest and 
in showing the latter somewhat more accept- 
able than the former. The results were also 
similar in that they showed more lines and 
words read for the high level than for the 
low level. Whereas in the present study the 
differences in amount read were not signifi- 
cant, in the pilot study the differences 
reached significance in one analysis and neat 
significance in another. 

Thus the results of this study appear pos- 
sibly related to an earlier study of style diffi- 
culty (4). In that study it appeared that 
subjects may have been “compensating” for 
a harder and less pleasant style by expending 
more “effort” and achieving somewhat higher 
scores. In the present study, the high level 
of human interest was consistently judged 
harder and less pleasant to read than the 
low, yet it appears that subjects may have 
again compensated to some extent by read- 
ing slightly more of the high level material in 
the allotted time. Further work is, of course, 
necessary before this hypothesis can be more 
definitely stated. 

Apart from this, however, the finding that 
the high level of human interest was less ac- 
ceptable than the low is interesting in itself. 
It seems to run counter to most predictions, 
but does come near to being supported by 
Ludwig (5). It suggests that subjects might 
have definite “expectations” regarding the na- 
ture of technical writing. Most technical 
material is either low or entirely lacking in 
human interest, presumably to lend an im- 
personal or objective flavor to it. It seems 
possible here that subjects did not like a per- 
sonal style because they had not been led, 
by previous experience, to expect one. 

It should be emphasized that this finding 
does not necessarily apply to nontechnical 
nonfiction (e.g., the New Yorker’s “Talk of 





Human Interest and Technical Material 95 


the Town’’).* It would, further, seem highly 
unlikely to apply to fiction, since personal 
words constitute part of, at the least, almost 
all fiction. It should also be emphasized, as 
has been pointed out to the authors, that 
subjects who have had some experience with 
personalized technical material (e.g., through 
training in which such material was used) 
might well find such material equally or more 
acceptable than impersonal technical mate- 
rial. 
Conclusions 


The results of this study indicate that a 
high level of human interest (“personal” 
treatment) of technical writing produced no 
significant difference in immediate retention 
test score, was consistently judged less ac- 
ceptable, and showed a tendency to produce 
a greater amount read in a given time, com- 
pared to a low level of human interest (‘“‘im- 
personal” treatment). 


4 For a discussion of this point see reference 3. 


The study also indicates the great impor- 
tance of content in determining the accept- 
ability of material and the high relationship 
between judgments of material as “easier to 
read” and “more pleasant to read.” 


Received March 19, 1954. 


References 


. Flesch, R. A new readability yardstick. J. appl. 
Psychol., 1948, 32, 221-233. 

2. Flesch, R. Reader comprehension of news stories: 
further comment. Journalism Quart., 1951, 
28, 496-497. 

. Klare, G. R., & Buck, B. Know your reader. 
New York: Hermitage House, 1954. 

. Klare, G. R., Mabry, J. E., & Gustafson, L. M. 
The relationship of verbal communication 
variables to immediate and delayed retention 
and to acceptability of technical training ma- 
terials. USAF, Personnel & Training Res. 
Cent., Res. Bull., 1954, No. 54-103. 

5. Ludwig, M. C. Hard words and human interest: 
their effects on readership. Journalism Quart., 
1949, 26, 161-171. 





The Journal of Applied Psychology 
Vol. 39, No. 2, 1955 


Respondents Rate Public Opinion Interviewers * 


J. Marshall Brown ? 
Lafayette College 


Individuals concerned with opinion research 
generally recognize the importance of the in- 
terviewers’ work, but evaluating this work has 
always been difficult. Although many studies 
show that interviewers can and frequently 
do bias respondents’ answers, relatively few 
studies have reported an attempt to find the 
cause of the bias.* Only rarely have studies 
been reported in which the relationship be- 
tween the interviewer and respondent during 
the interview was measured. 

Guest (14) reported a study in which he 
analyzed recordings of a standard interview, 
in which he had one coached respondent give 
set answers to a number of different inter- 
viewers. Although there was little relation- 
ship between the respondent ratings and 
judges’ opinions of the recorded interviews, 
Guest suggested that further refinement of 
the rating scale used by the respondent and 
more cases would probably yield more sig- 
nificant results. 

Bennett (2), reported the use of a fol- 
low-up post card to respondents as a check to 
see if the interviewers could make an over- 
all “dependability rating” of the respondents’ 
answers, but Bennett did not report the use 
of respondent ratings for evaluating inter- 
viewers’ work. 

This article reports the use of a respondent 
rating scale for measurement of the inter- 
viewer-respondent relationship. 


1 This article is based on a Ph.D. dissertation done 
under Professor L. P. Guest. The dissertation is on 
file in the Pennsylvania State University Library un- 
der the title, “The Development and Testing of a 
Respondent Rating Scale for Opinion and Market 
Research Interviewers.” A microfilm copy of the 
complete manuscript is obtainable from University 
Microfilms, Ann Arbor, Michigan, as publication 
number 3301. 

2 The author wishes to express his gratitude to the 
National Opinion Research Center whose coopera- 
tion made this study possible. A special word of 
thanks is due Mr. Paul B. Sheatsley, Eastern Rep- 
resentative. 

8 Articles suggesting interviewer bias are references 
3, 5, 7, 10, through 21, 23, 24. Several articles 
studying causes of bias are 3, 7, 10, 11, 14, 15, 18, 
21, 23, 24. 


Procedure 


The respondent rating scale. Character- 
istics of good interviewing were collected from 
references (1, 4, 6, 8, 9, 22) and from con- 
versations with experts in the field of inter- 
viewer selection. The author, in conjunction 
with opinion research directors, constructed 
items which allowed respondents to rate 
characteristics of the interviewers’ work, 
qualities of the questionnaire, and respond- 
ents’ feelings at the time of the interview. 
After pretesting and revision these items 
formed the Respondent Rating Scale (R. R. 
Scale) as shown in Fig. 1. Included in Fig. 
1 are statistics showing percentages of re- 
spondents indicating each alternative and 
ratios of usable answers to eligible questions 
given by respondents. 

Through the cooperation of the National 
Opinion Research Center (NORC), the R. R. 
Scale was used with one of their periodic 
polls. At the end of the interview, the inter- 
viewer was instructed to hand the respondent 
the R. R. Scale enclosed in an envelope, while 
explaining its purpose. He was to elicit as 
much cooperation from the respondent as pos- 
sible, but not to fill out the form. The inter- 
viewer was to wait until the respondent com- 
pleted the R. R. Scale and returned it to him. 

While the respondent was completing the 
R. R. Scale, the interviewer was completing 
the factual data and interviewer items on the 
questionnaire. After collecting the R. R. 
Scale, the interviewer checked whether the re- 
spondent filled it out, was unable to do so, or 
refused to fill it out. 

The survey. A representative sample of 
1,276 adults was interviewed by the regular, 
trained NORC interviewers. The interview- 
ers were part-time workers, primarily women, 
with varying amounts of interviewing experi- 
ence. Details for administration of the R. R. 
Scale were included in the instructions for the 
survey. (The questionnaire used in this sur- 
vey was one of NORC’s periodic studies and 





Respondents Rate Public Opinion Interviewers 


CONE TDENTIAL 


NATIONAL OPINION RESEARCH CENTER 


University 


CORF IDENTIAL 
of Chicago 


Please put e check in the box next to the sentence which tells best how you felt 
about being interviewed. Put this paper back in the envelope and seal it, if you 
wish. The interviewer will mail it for you. Your answers will not affect the 

interviewer's job in any way, but will help us to improve our work on future mur- 


veys, Thank you. 


Cdide| fart” 


Diredtor 





CHECK THE ONE ANSWER FOR EACH GROUP THAT TELLS BEST WHAT THE INTERVIEW WAS LIKE. 





much did you enjoy the interview? 
) I didn't onjoy it at all 
I enjoyed it very little 
I enjoyed it somewhat 
I enjoyed it very much 
o answer 


( ) Interviewer and I talked about 
other things from time to time 
while doing the survey 

( ) Interviewer didn't talk to me much 
about anything except the survey 

No answer 





It seamed to me that the interview took 
( ) Hardly any time at all 
( ) Very little time 
( ) A fair amount of time 
( ) A great deal of time 
No answer 


( ) Interviewer wont right into the 
questions without telling me too 
much of what the survey was about 

( ) Interviewer took a lot of time to 34 
explain the purpose of the survey 

No answer 10 





Getting people's opinione in public 
opinion surveys liko this is 

( ) Very ueeful 

( ) Somewhat useful 

( ) Not very useful 

( ) Not useful at all 
__No answer 





Did the interviewer come at a good time 
for you, or would it have been better if 
the interviewer had come at a different 
time? 
( ) It was @ very good time for me 
( ) It wae a feirly good time for me 
( ) Another time would have been 
somewhat better 
( ) Another time would have been 
much better 
No answer 


( ) When I said I didn't know the answer 66 
to & qiestion, the interviewer 
repeated the questions, trying 
to get an anewer from me 

( ) When I said I didn't know the anawer 22 
to a question, the interviewer 
went right on to the next question 

No answer 7 





( ) Interviewer made me feel I was free 
to give the first answer that 
came into my mind 

( ) Interviewer made me feel that I 
should think carefully before 
answering 

No answer 








In general, which one of the things in 
this list bost describes the way you 
felt during the interview? CHECK ONE ONLY 


It was like: 

) Taking an intelligence test 

) Being on the witness stand in court 
) Having e political argument 

) Voting the way I feel in an election 
) Having a friendly discussion 

) Answering the questions on some 


overnment form 
Ho _andwer 


( 
( 
( 
( 
( 
( 


( ) Interviewer's opiniones seemed to be 
pretty mch like mine 
( ) Interviewer's opinions seemed to be 
pretty different from mine 
) Interviewer didn't scem to have any 
opinions of his om 
No answer 








Do you expect the United States to figh 
in another war within the next ten years 


( ) Yes 
( ) No 
G7 


Don't know 





USE THE OTHER SIDE OF THIS PAGE FOR COMMENTS IF YOU WISH 





Note—The percentages, ratios, and phrases "no answer" have been added for this 
article and did not originally appear on the Respondent Rating Scale. 


Fic. 1. The Respondent Rating Scale for opinion interviewers. The illustration also shows percentages 
of respondents who checked each alternative or gave no answer and, for each item, the ratio of usable an- 
swers given to eligible questions on the survey questionnaire. 


was considered of average length and diffi- 
culty by the staff of NORC.) 

Handling of the data. The frequencies of 
respondents’ replies to every item on the 
R. R. Scale were compared with: 


1. A subjective numerical rating for the past year’s 
work, given interviewers by NORC personnel. 


2. The serial position of the interview in relation 
to the total number of interviews done by one in- 
terviewer. 

3. The actual number of surveys for NORC in 
which interviewers had participated. 

4. A ratio of the number of refusals to the total 
number of interviews completed on this survey. 

5. The number of “don’t know” answers obtained 
which were not considered legitimate answers. 





J. Marshall Brown 


Table 1 


Significant Contingency Coefficients Between Respondents’ Answers on the Respondent Rating Scale and 
Three Criteria of Interviewing and Three Interviewer Ratings 








Interviewing Criteria 


Interviewers’ Ratings 





Experience 
(Number 
Respondent Rating of 
Scale Items Surveys) 


Refusal 
Rate 


Number 


Inter- 

views 

More 
Enjoyable 


Respond- Respond- 

ents More ents More 

Coopera- Frankand 
tive Honest 


oO 
“don’t 
knows”’ 





How much did you enjoy . . . 

It seemed to me that the interview 
took... 

Getting people’s opinions in public 
opinion surveys... 

Did the interviewer come at a good 
time... 

In general, which one of the things 
on this list . . . 

Interviewer and I talked about other 
things... 

Interviewer 
questions .. . 

When I said I didn’t 
answer... 

Interviewer made me feel I was 
free to give... 


went right into the 


know the 


Interviewer’s opinions seemed to 
be... 


34** 35** 23** 


a" 


.29** .28** .29** 


ao aa"  * ag 
Pe 
_ 
.29** 
— 
-_— 
— 


_ 





* Significant at the .05 level. 
** Significant at the .01 level. 


6. A ratio of the total number of usable answers 
to the total number of questions where there should 
have been an answer. 


The above items were compared with re- 
spondents’ replies to every item on the R. R. 
Scale because they were methods of estimat- 
ing interviewer competence or were thought 
to have an influence on interviewing. Sig- 
nificance of relationship between the R. R. 
Scale items and characteristics of the inter- 
viewers was determined by the chi-square 
technique. The technique was also used to 
test for reliability of the relationship between 
the items on the R. R. Scale and interviewers’ 
replies to certain items on the questionnaire. 
For significant relationships, coefficients of 
contingency were computed to indicate the 
amount of relationship. The ratios of the 
number of refusals to the total number of in- 
terviews completed in this survey were pre- 
sented without tests of significance, since no 
suitable test could be found. 


Respondents’ replies compared to six cri- 
teria of interviewing. There were no signifi- 
cant relationships between supervisory ratings 
of interviewers’ past year’s work and any 
items from the R. R. Scale. Similarly, there 
were no significant relationships between the 
serial position of the interview in relation to 
the total number of interviews done by one 
interviewer and any items from the R. R. 
Scale. 

One item from the R. R. Scale was signifi- 
cantly related to the actual number of sur- 
veys for NORC in which interviewers had 
participated. Four items showed significant 
relationships to the ratios of refusals to the 
number of interviews completed. However, 
there were no trends apparent as to direction 
of association. 

The survey questionnaire contained four- 
teen questions where a “don’t know” answer 
could have been recorded and not be of use 
in the reporting of the questionnaire results. 





Respondents Rate Public Opinion Interviewers 99 


In all other places where a “don’t know” an- 
swer could be recorded, it was considered as 
usable, since a respondent could legitimately 
not know some answers. The cases where 
“don’t know” was considered as nonlegiti- 
mate consisted of questions which involved 
only an opinion and not factual data. An 
indication of how well the interviewer probed 
may be estimated by a count of such answers. 
For every item on the R. R. Scale there was 
a chi-square value indicating a significant re- 
lationship at the .01 level. There was a con- 
sistent tendency for those respondents who 
didn’t answer an R. R. Scale item to give a 
larger number of nonlegitimate “don’t knows.” 

The contingency coefficients for all signifi- 
cant relationships between R. R. Scale items 
and criteria of interviewing are shown in 
Table 1. 

Since direction of association is not indi- 
cated by contingency coefficients, Table 2 is 


included to show the trends apparent in the 
contingency tables. 

As shown in Tables 1 and 2, the direc- 
tion of associations consistently indicated that 
fewer “don’t know” answers were obtained 
from respondents who reported that inter- 
viewers have used techniques generally con- 
sidered good by supervisors. Significant re- 
lationships without apparent trends are not 
shown in Table 2. 

During the interview there were several 
opportunities for a respondent to give free 
response answers. Most of the questions ask- 
ing kw, what, or why of a respondent fol- 
lowed a filter question, which determined 
whether the free response questions should be 
used, i.e., “Do you think we should have 
rationing of coffee at this time?” If answered 
yes, then the free response question might 
ask why, how, etc. 

According to the usual coding criterion, 


Table 2 


Two Characteristics of Respondents’ Answers to Questionnaire Items and Three Interviewer Ratings 


Compared to Some Respondents’ Answers on the Respondent Rating Scale 








Respondents Who Gave— 


Interviewers Rated— 





Fewer 
“don’t 

know” 
Answers 


Respondents Indicated on the 
Respondent Rating Scale that: 


Inter- Respond- Respond- 

views ents More ents More 

More Coopera- Frank and 
Enjoyable tive Honest 





More 
Usable 
Answers 





They enjoyed the interview a 

The interview took little time ? 

The technique of opinion polling was 
useful 

The interviewer came at a good time 

They felt like they were having a friendly 
discussion 

They felt as though they were voting in 
an election 

They talked with the interviewer about 
other things 

The interviewer went right into the ques- 
tions 

The interviewer took a lot of time to 
explain 

The interviewer repeated the questions 

They felt they should think carefully be- 
fore answering 

The interviewers’ opinions were similar to 
theirs or the interviewers had no opin- 
ions 

¥. 


* * * * 


* * * * 





* Indicates an apparent relationship. 





100 


free response answers which were vague, ir- 
relevant, illegible, or ‘don’t know” were not 
counted as usable. The ratio of usable an- 
swers to eligible questions was used to equate 
all of the respondents, since the number of 
usable free response answers that could be 
given depended on how the filter questions 
were answered. This ratio includes all usable 
answers even if more than one answer was 
given to one question. Thus a ratio of 1.000 
may indicate two usable answers to each of 
five questions and no answers to five others, 
or one usable answer to each of ten questions. 
(Although this is a gross measure of good in- 
terviewing, it is one of the methods of evalu- 
ating interviewers suggested by several promi- 
nent investigators. ) 

Inspection of the ratios shown in Fig. 1 
indicates that the ratio of usable answers to 
eligible questions was generally lowest for re- 
spondents who failed to answer individual 
R. R. Scale items. In two specific instances 
there was a lower ratio for respondents who 
answered one of the alternatives on a R. R. 
Scale item than for those who didn’t answer 
the item. These two alternatives were “Be- 
ing on the witness stand in court,” and “In- 
terviewer’s opinions seemed to be pretty much 
different from mine.” The highest ratio of 
any alternative was given by those respond- 
ents who indicated that “Interviewer and I 
talked about other things from time to time 
while doing the survey.” The average ratio 
for all respondents was 1.020. Respondents 
who gave more usable answers tended to an- 
swer R. R. Scale items as shown in Table 2. 

Comparing Respondent Rating Scale and 
questionnaire results. The question, “Do you 
expect the United States to fight in another 
war within the next ten years?” was asked 
of the respondents during the regular inter- 
view and on the R. R. Scale. The relation- 
ship between the respondents’ answers on the 
question on both occasions was high (con- 
tingency coefficient of .73, significant at the 
01 level). There were some differences be- 
tween answers on the questionnaire and those 
given on the R. R. Scale. Table 3 shows 
of the characteristics of the respondents an- 
swering differently or the same on both oc- 
casions. 


J. Marshall Brown 


Table 3 


Percentages of Respondents Answering the Identical 
Question the Same Way or Differently on the 
Questionnaire and the Respondent Rating 
Scale by Age, Educational Level, and 

Economic Level of Respondent 





Per Cent 
Same 


Per Cent 
Different Per Cent 


Item 





Age 
20-29 91 100 
30-39 90 100 
40-49 90 100 
50 or over 85 100 


Educational level 
College 8 100 


High school ) 100 

Grammar school 15 100 
Economic level 

<< 8 100 

as 100 

— 100 

a : 100 
All respondents 


Respondents with more education answer 
the same way on both occasions more fre- 
quently than those of lower education. The 
respondents in the older age group (fifty and 
over) shifted their answers more than those 
of younger groups and the respondents in the 
lowest economic group shifted their answers 
more than those in the middle groups. In all 
cases there were about 89 per cent of the re- 
spondents giving the same answer both times, 
while 11 per cent shifted. 

While respondents were completing the R. 
R. Scale, the interviewers indicated whether 
they “went right into the questions without 
much explanation” or “took some time to ex- 
plain the purpose of the survey.” When com- 
pared to respondents’ indications of whether 
interviewer took time to explain purpose of 
the survey the contingency coefficient was 
.23 (significant at the .01 level). 

The two phrases “Interviewer (Respond- 
ent) and I talked about other things from 
time to time while doing the survey” and 
“Interviewer (Respondent) and I didn’t talk 
much about anything except the survey” were 
on the questionnaire and the R. R. Scale. A 
positive relationship was evident in the con- 





Respondents Rate Public Opinion Interviewers 101 


tingency table with contingency coefficient 
equal to .33 (significant at the .01 level). 

Interviewers’ ratings of respondents’ co- 
operativeness, frankness, and honesty, and 
how enjoyable they thought the interview 
was, were compared with respondents’ an- 
swers to each item on the R. R. Scale. For 
every item on the R. R. Scale there was a 
significant relationship (.01 level) with in- 
terviewers’ ratings of cooperativeness and 
with interviewers’ ratings of the enjoyability 
of the interview. All items but one ‘were re- 
lated (.01 level of significance) to the inter- 
viewers’ ratings of respondents’ frankness and 
honesty. Table 1 shows contingency coeffi- 
cients for these significant relationships. 

Respondents who did not complete the R. 
R. Scale participated in interviews rated as 
less enjoyable, were rated as low in coopera- 
tion, and low on frankness and honesty. 
Trends apparent for respondents who par- 
ticipated in interviews rated as more enjoy- 
able, who were rated as more cooperative, 
and higher in frankness and honesty are 
shown in Table 2. 


Conclusions 


Experience from this survey indicated that 
the technique of administration of the R. R. 
Scale is feasible. Both respondents and inter- 
viewers cooperated in completing the study. 
Of 1,276 interviews completed only sixty re- 
spondents did not answer any of the R. R. 
Scale items. Twenty-six of the sixty refused 
to fill out the scale, whiie the others were un- 
able to do so, probably because of illiteracy. 
Very few of the interviewers expressed dis- 
satisfaction in their reports to NORC. 

A significant relationship with two gener- 
ally accepted criteria shows the R. R. Scale 
to be a suitable instrument for partial evalua- 
tion of interviewers’ work. Four other cri- 
teria which did not show significant relation- 
ships with the R. R. Scale are not always 
acceptable as criteria of interviewers’ com- 
petence. 

It is suggested that the R. R. Scale is 
valuable not only as an evaluation tool, but, 
extending the suggestions of Connelly and 
Harris (9), the R. R. Scale might be profit- 
ably used as a selection tool. They sug- 


gested the use of “initial assignments” be- 
fore training as a means of cutting selection 
expense. An obvious addition would be the 
use of R. R. Scales at the time of initial in- 
terviewing so that objective data for selection 
purposes could be obtained. 

The high relationship between respondents’ 
answers to an identical item on the question- 
naire and the R. R. Scale not only indicated 
reliability but also adds faith to the question- 
naire technique. Since the item was the first 
one on the questionnaire and last on the R. 
R. Scale, the high correlation showed that 
the questionnaire and the R. R. Scale items 
did not have a great influence on expressed 
opinions. It is also interesting to note such 
high relationship, since the respondent re- 
plied verbally to the ‘question from the ques- 
tionnaire and inserted the answer himself on 
the R. R. Scale. 

Low positive relationships between inter- 
viewers’ and respondents’ indications of 
amount of explanation and outside discus- 
sion during the interview may be partially 
explained by the fact that interviewers 
checked their indications before respondents 
completed the R. R. Scale. During the fol- 
lowing period of time there may have been 
outside discussion or explanation of the sur- 
vey which was then indicated by the respond- 
ents. Some interviewers also commented that 
they never strayed from the topic of the in- 
terview, but that they had to get the re- 
spondent “back on the track.” When this 
occurred, respondents indicated it as outside 
discussion while interviewers were prone not 
to do so. 

Summary 


On a national survey a respondent rating 


scale was used to measure interviewer-re- 
spondent relationship during the interview. 
The rating scale was handed to respondents 
enclosed in an envelope and collected by the 
interviewers. The technique of administra- 
tion was considered feasible since few re- 
spondents did not fill out the scale and very 
few interviewers expressed any dissatisfaction. 

Respondents’ ratings were compared with 
six criteria of good interviewing and found to 
relate significantly with the number of non- 
legitimate “don’t know” answers obtained 





102 


and the number of usable answers given to 
free response questions. The validity of the 
four criteria which did not relate to the rat- 
ings was questioned. 

A high relationship was found between the 
same question asked at the beginning of the 
questionnaire and as the last item on the 
rating scale. There was also a significant 
positive relationship between interviewers’ 


ratings of the things they did during the in- 
terview and respondents’ indications of the 
same things. 


Received April 13, 1954. 


References 


. Andrews, L. de L. That dreadful interviewer 
problem again. Int. J. Opin. Attitude Res., 
1949, 3, 587-590. 

. Bennett, A. S. Toward a solution of the cheater 
problem among part-time research investiga- 
tors. J. Marketing, 1948, 12, 470-474. 

. Blankenship, A. B. The effect of the interviewer 
upon the response in a public opinion poll. 
J. consult. Psychol., 1940, 4, 134-136. 

. Blankenship, A. B. Consumer and opinion re- 
search. New York: Harper, 1943. 

. Blankenship, A. B. A _ source of interviewer 
bias. Int. J. Opin. Attitude Res., 1949, 3, 
95-96. 

. Borg, L. E. Interviewing School. 
Attitude Res., 1948, 2, 393-400. 

. Cantril, H. (Ed.) Gauging public opinion. 
Princeton: Princeton Univer. Press, 1944. 

. Clarkson, E. P. The problem of honesty. Int. 
J. Opin. Attitude Res., 1950, 4, 85-90. 

. Connelly, G. M., & Harris, N. A symposium on 
interviewing problems. /nt. J. Opin. Attitude 
Res., 1948, 2, 69-84. 


Int. J. Opin. 


J. Marshall Brown 


10. Crespi,L. The interview effect in polling. Publ. 
Opin. Quart., 1948, 12, 99-111. 

11. Fisher, H. Interviewer bias in the recording op- 
eration. Int. J. Opin. Attitude Res., 1950, 4, 
391-411. 

. Freiberg, A. D., Vaughn, C. L., & Evans, M. C. 
Effect of interviewer bias upon questionnaire 
results obtained with a large number of in- 
vestigators. Amer. Psychologist, 1946, 1, 243. 
(Abstract) 

. Friedman, P. A second experiment on inter- 
viewer bias. Sociometry, 1942, 5, 378-381. 

. Guest, L. P. A study of interviewer competence. 
Int. J. Opin. Attitude Res., 1947, 1, 17-30. 

. Guest, L. P., & Nuckols, R. A laboratory ex- 
periment in recording in public opinion inter- 
viewing. Int. J. Opin. Attitude Res., 1950, 
4, 336-352. 

. Katz, D. Do interviewers bias poll 
Publ. Opin. Quart., 1942, 6, 248-268. 

. Rice, S. A. Contagious bias in the interview. 
Amer. J. Sociol., 1929, 35, 420-423. 

. Robinson, D., & Rohde, S. Two experiments 
with an anti-semitism poll. J. abnorm. soc. 
Psychol., 1940, 41, 136-144. 

. Shapiro, S., & Eberhart, J. C. Interviewer dif- 
ferences in an intensive interview survey. Int. 
J. Opin. Attitude Res., 1947, 1, 1-17. 

. Smith, H. L., & Hyman, H. The biasing effect 
of interviewer expectations on survey results. 
Publ. Opin. Quart., 1950, 14, 491-506. 

. Stanton, F., & Baker, K. Bias and the recall of 
incompletely learned materials. Sociometry, 
1942, 5, 123-134. 

. Wechsler, J. Interviews and interviewers. 
Opin. Quart., 1940, 4, 258-260. 

. Williams, F., & Cantril, H. The use of inter- 
viewer rapport as a method of detecting dif- 
ferences between “public” and “private” opin- 
ions. J. soc. Psychol., 1945, 22, 171-175. 

. Wyatt, D. F., & Campbell, D. T. A study of in- 
terviewer bias as related to interviewers’ ex- 
pectations and own opinions. Int. J. Opin. 
Attitude Res., 1950, 4, 77-83. 


results ? 


Publ. 





The Journal of A 


pplied Psychology 
Vol. 39, No. 2, 1955 


Aptitude, Achievement, Interest, and Personality Tests: 
A Longitudinal Comparison * 


Ralph F. Berdie 


Student Counseling Bureau, Office of the Dean of Students, 
University of Minnesota 


Do tests of special abilities differentiate 
occupational and educational groups to a 
greater or lesser extent than do vocational in- 
terest tests? The relatively greater emphasis 
placed upon ability and aptitude tests in edu- 
cational and vocational guidance suggests that 
these tests have superior differentiating power. 
On the other hand, the standardization of 
vocational interest tests frequently is based 
upon their effectiveness in differentiating 
among occupational groups. The present 
study allows a comparison to be made be- 
tween different types of tests, both in terms 
of how effectively they differentiate among 
groups and how predictive they are of vari- 
ous educational outcomes and experiences. 

In the fall of 1939, slightly over one thou- 
sand of the approximately 1,500 entering 
freshmen in the College of Science, Litera- 
ture, and the Arts at the University of Min- 
nesota were administered a comprehensive 
battery of tests including: 


Thurstone Primary Mental Abilities Test 
(seven separate scores) 
Strong Vocational Interest Blank 
(men and women’s form used appropriately ) 
Cooperative Social Studies Test 
(Form P, 1939) 
Cooperative Natural Science Test 
(Form P, 1939) 
Cooperative Mathematics Test 
(Form P, 1939) 
Minnesota Personality Inventory 
(Scores on Morale, Social Adjustment, Family Ad- 
justment, Emotional Adjustment, Economic Con- 
servatism ) 


In addition to these measures, scores on the 
1937 American Council on Education Psy- 
chological Examination and the Cooperative 
English Test, Form OM, and scholastic rank 
in the graduating high school class were avail- 


1This study was financed with funds obtained 
from the Commission on Human Resources and Ad- 
vanced Training and the University of Minnesota’s 
Graduate School and Office of the Dean of Students. 


103 


able. These latter three measures were ob- 
tained through the State-Wide Testing Pro- 
gram sponsored by the Association of Min- 
nesota Colleges. 

When these freshmen completed their first 
year in college, their grade-point averages 
were obtained and included in an analysis of 
the test data (1). The results of this pre- 
liminary analysis can be summarized briefly. 
The zero-order correlations between tests and 
grades were higher for women than for men. 
Any one part of the Thurstone test for either 
sex group was a less effective predictor of 
college achievement than the more economi- 
cal and accessible achievement tests or gen- 
eral ability test. For men, earned first-year 
grades correlated .49 with predicted grades 
based upon high school rank, American 
Council on Education test, mathematics test, 
social studies test, and natural science test. 
The multiple correlation for men between 
first-year grades and the seven primary men- 
tal abilities was .38. Intercorrelations among 
the seven primary mental abilities scores 
were low and indicated independent variance 
among the seven scores. Each of the seven 
primary abilities scores are based upon two 
or three subtests. The scores on these sub- 
tests for any one factor also tended to have 
low intercorrelations. 

A tendency was reported for the scores on 
the personality scale to be more closely re- 
lated to vocational interest scores than were 
scores on the seven primary mental abilities 
tests. Darley concluded, “These results dem- 
onstrate that vocational interests are related 
to other tested personality characteristics and 
suggest that systematization of the present 
chaotic order existing in personality measure- 
ment awaits only the development of new 
experimental techniques” (1, p. 199). 

Such a rich variety of test information ob- 
tained on such a large group of beginning 





104 


students provided a promising opportunity to 
perform a longitudinal analysis. In 1949, a 
follow-up study was initiated, and Univer- 
sity records were obtained for the students 
originally tested as freshmen. During the 
nine years intervening between the time of 
testing and the time of the follow-up study, 
many of these students went to war, trans- 
ferred to other colleges and universities, or 
left college altogether, and a sizable number 
graduated from various curricula in the Uni- 
versity. In this study, no attempt was made 
to follow up students other than those who 
obtained degrees from the University of Min- 
nesota prior to 1949. Of the 554 men who 
were tested in 1939, 219, or 40 per cent, and 
of the 547 women tested, 252, or 46 per cent, 
obtained collegiate degrees from Minnesota. 
The types of degrees these graduates earned 


Table 1 


Types of Degrees Earned Before 1949 by 
Freshmen of 1939 








No. of 
Men 

Business (B.B.A.) 64 20 

Medicine (M.D.) 37 3 

Dentistry (D.D.S.) 21 0 

Education (B.S.) 18* 17t 
Journalism (B.A.) 14 10 

Law (L.L.B.) 11 4 

Engineering and chemistry (B.S.) 10 0 

Social studies (B.S. in education) vo) 5 

Social studies (B.A.) 

Pharmacy (B.S.) 

Associate of Arts (A.L.A.) 

Economics (B.A.) 

English and speech (B.A.) 

Psychology (B.A.) 

Geology (B.A.) 

Nursing (B.S.) 

Medical technology (B.S.) 

Music and art (B.A. and B.S.) 

Library training (B.S.) 

Foreign language (B.A. and B.S.) 

Dental hygiene (G.D.H.) 

Other degrees 0 


No. of 


Type of Degree Women 





t 
me Now viu 


Total no. of entrantst 554 
Total no. of graduates 219 





* Includes the 9 social studies (B.S. in education). 

+ Elementary education only. 

t Not including 20 men and 10 women who left the Univer- 
sity before receiving any grades. 


Ralph F. Berdie 


are shown in Table 1. If one considers that 
many of the entering freshmen actually ob- 
tained degrees in other institutions, the 43 
per cent who earned degrees from the Uni- 
versity of Minnesota within nine years after 
matriculation agrees reasonably well with the 
often quoted figure that 50 per cent of col- 
lege freshmen eventually earn degrees. 

From the academic records of the students, 
in addition to the kinds of degrees obtained, 
grades and number of credits completed in 
various types of courses were determined. 

The extent to which tests differentiated 
among the various curricular groups was de- 
termined through the use of analyses of vari- 
ance. The homogeneity of variance was first 
tested using Hartley’s M test and the analy- 
ses of variance were completed only in those 
cases where the variance was homogeneous 
at the .01 level of probability. In making 
comparisons between particular groups, the 
significance of the differences between means 
was determined through the use of critical 
ratios and ¢ tests. Percentage of overlapping 
was also computed to clarify some of the re- 
sults. 

Product-moment correlations were computed 
between test scores and honor-point ratios in 
special course areas and number of credits of 
work completed in these areas. The means 
and standard deviations were computed for 
each of the curricular groups, thus allowing 
the construction of curricular profiles. 


Results 


Table 2 presents the means, standard devia- 
tions, and sizes of the samples for all of the 
tests for the larger curricular groups of 
males, and Table 3 presents similar informa- 
tion for the women. In both tables, the first 
three sets of data provide information for all 
entering freshmen, for all who graduated from 
the University of Minnesota, and for the re- 
maining nongraduates. The tables can be 
read in this way: There were 64 male Arts 
College freshmen in 1939 who later graduated 
with degrees of Bachelor of Business Admin- 
istration, and as freshmen they obtained a 
mean score of 84.1 and a standard deviation 
of 21.3 on the American Council on Educa- 
tion Psychological Examination. 





6'8S ‘Ig ozs jaaa’y peuoNedns09 
8°se q ' : 6 6F 
Z'et a L'87 
wie | ree 
V2 "nF 6°08 
6'£7 ' Ste 
ive at wan rer 


Ayrururma, J-AP UN Ise yy 
JOAMP’] 

JIYIVI TL ‘DS IOS 
I9IUIB UY 

st}UIG 

uvrioisAyd 


aIA * 


Mt 9801 . 'g O'FO! 
! f TSE . ¢ . . “o . . Q 
l't71 fer 6 STI 4 8 
Lez 9'07Z 8°S7Z 
F8sl Lest L 6ST 


Ill 
‘Tl 
‘I 


aeog Ayyeuosiag “UUTY 


SUTUOSBIY 


$79 8Ol 09 689 
97E 98 wee ; Ve 
cL 90! Ol Tor 
Lstt go¢ 7 60T . Liat 
6'Sol 601 £86 0°68 : , 4 , :" ; ; : i [Bq19A 
8 A | 687 Lee ecel : * ‘ "ee ; sTOquINN] 
90ST Vile LBP ; 99ST uordaoi94 


uoTjoNnpuy] 
AIOWD TN 
aoeds 


SINYIGY [BeyUdTY AIVUIL | 


Fl ‘OL 6 " "” : 7 ‘pnig ‘90g ‘dooy 


if ¢9 ‘Ol O¢F , , 1 ; f , "IS "JBN “dooy 
I 


‘SF 68 80F r 'g 86 § 6 + wep ‘doo 
ysysuyq “dooy 


789 OIL FTL 6°19 s 
16? eet SP ees i 
a Ls L6¢ ‘ VIP 5 
9007 Olt EF TOl j PPol Tos 66! coe SILI 
9°£6 6LT 798 0°98 sz Ol £1? Te 7 678 ; , AOV 

9'6L F C6L £°¢SZ ‘78 9ST PL ‘0Z oO : armuadied fooyos Ysty 
W ds W ds W ds 


o 
~ 
o 
i) 
Sy 
— 
Ss 
~ 
Ss 
“< 
~ 
S 
a 
= 
S 
1S) 
~~ 
8 
= 
‘~= 
o~) 
= 
~ 
— 
Ve) 
= 
Ss 
—) 
x 


JC=N Lo=N 9 =N Ste = 
wstjeuinof uonPronpy Aqjstqueq IULDIPIT ssouisng sajenpt 
UON IIV 








66 Ul dy PeMOT[OT UIT UBUIYSIL] GEG] FO sdnoiry AP[NILUIN,) SNOMUBA IO} S}SaT [[y 40} sojduing JO SaZzIg puev “SUOTILIAIG, plvpuRysS ‘suva]y 


A) LA 





Zo ZF JAYIO MA, IYO [esvuay 
Vol CsI JIYIVIT, IUIING [RID0G 
FZ PST Jayovay, “Buy 
SZ 8°97 UBLIBIQI’T 
ZL] ; . oF . 62 uviois<yg 
68e , FOF : ‘Lt asinN 
ALA 2u0nsS 


+66 PF TOL 1°66 ; ‘06 OO! OOOT F7IL 786 766 WISTJBAIISUO) MUOUOIT “A 
S191 Slt 8°eol j ‘S91 97Z 699T £02 ¢99T 9991 {py [euonowy “A 
LP | F OFT ; o'Stl Stl 86l 7S 00% 69FT O9Fl {py Ajmey “TT 
VL6l i ¢l6l 9961 ; ‘007 19% 7707 9F7 T1007 7 102 {py 20S “TI 
POLI OeLI ; Msc “ELt OST 67LI OFT PPL ELI ae1oyy ‘I 
afeog Apeuosiag “UUTTY 


LL Sve TCE 98 8st O08 ce 16 ee V8 OF 88 Te uoronpuy 
68 FZ ) S12 68 9&7 V6 SIZ £6 ‘07 8. Uc 68 FIZ A1OWI 
Lol MIT 6801 coe 897I 6le GOTT ore SOIT se 6ltt ose OOTT aoeds 
SLT 106 7°86 F0z 090I O6t LL8 OT? 68 TI? £16 SIZ 76 [Bq19A, 
£7 60 £971 VlLZ O08 69TI 97 SSIT 862 T0ZI PTE MOLT Ss19quUINN 
66 FLEI 9191 661 ost Fost £F2 MOST FIZ 98ST Oe~ SLST uondari9d 
saniqy [ejuay Arewiug 


Ss 
a~) 

x 

L 
%Q 
a 
x 
> 

S 
Re 


6O1l 68F , o'¢9 Lil Lat @ L6 ots STZ OOS LOt Ls ‘PNIS ‘90g “dooy 
738 LF ; 6 6¢ Let @ VC vr 7se Pt FOF OFT OLE ‘Dg “JBN “dooy 
SL vee ! 7o¢ SL ¢8 ; 96 O7E 68 8st vO eee yep “dooy 
OE 8661 : S017 PST £0¢ Sle FZ6l Se F607 LLE T0007 ysysugq “doo 
98 SIs ; 9'F6 £6! 9'8T 1 jn 6) 987 9°06 cee 198 TOV 
OSt SLL : 0°06 cs ccl S OS LL 691 98 TST 708 a[tjuacsed jooyss ystHy 


ds qs WwW W ds qs WwW qs WwW qs WwW qs WwW a bt bad ah 
ST 3 al | 295 S22 ao me $62 = N cSt = N Lu = N 
‘onpy “wala ssouisng ‘WOOL, “Peyy Sulsinn soyenpeis sayenpriny s}uRlqUy 
"PNIS [BOS “UON TIV IV IIV 

















6F6] UL dy pamojjoy UsWIO MA UvUIYSIIT KEG] JO SdNoIy Ie[NIIIIND snoleA JO} s}SAJ, [[Y 10} sajdureg jo sazig puv ‘suoTeIAaq] psepur}s ‘suvayy 


£ qe L 





A Longitudinal Comparison of Tests 


Table 4 summarizes the results of the 
analyses of variance and lists the groups in- 
cluded. Eight of the men’s curricular groups, 
none with an N of less than 9, and nine of 
the women’s curricular groups, none with an 
N of less than 10, were included in the 
analyses. For the men, seven of the tests 


Table 4 
Analysis of Variance Results Based on Tests Taken 
by 1939 Science, Literature, and Arts College 
Freshmen Who Later Graduated from 
the University of Minnesota 


Men 
F Value 


Women 
F Value 


+ 
+ 

2.40* 

2.06* 


Test 
High school rank t 
ACE 86 
Coop. English .16* 
Primary Mental Abilities 
Perception 3.19F 
1.14 
2.00* 
1.70 
<1.00 
1.67 
AN ie 


Numbers 
Verbal 
Space 
Memory 
Induction 
Reasoning 
Minnesota Personality Scale 
Morale <1.00 
<1.00 
1.18 
<1.00 
1.15 
3.38T 
4.49F 
4.09F 


Social Adjustment 

Family Adjustment 

Emotional Adjustment 

Economic Conservatism 
Coop. Mathematics 3.03T 
Coop. Natural Science 5.28f 
Coop. Social Studies ao” 
Strong Vocational Interest Blank 

Physician 

Dentist 


+ 
+ 
31 
+ 
+ 
+ 
+ 
+ 
+ 


19.58t 
13.84f 
10.46f 

6.57T 


Lawyer t 


Engineer 

Social Science Teacher 

Nurse 6.65T 

12.84t 
1.83 
English Teacher 7.87T 

Social Science Teacher 5.23t 


General Office Worker t 


Physician 
Librarian 


Note.—Men's groups include: Business 64, Medicine 37, 
Dentistry 21, Journalism 14, Law 11, Engineering and Chem 
istry 10, Social Studies—Education 9, Social Studies—S.L.A. 9, 
total 175. Women's groups include: Nursing 35, Medical 
Technology 25, Business 20, Elementary Education 17, B.S. in 
Social Studies 15, B.A. in Social Studies 13, Librarian 12, 
Language 11, Journalism 10, total 158. 

* p» between .05 and .01. 

Tp <.01. 

¢ Variance not homogeneous, analysis of variance not com- 
puted. 


107 


did not provide homogeneous variance and 
no analyses of variance were done for these 
scores. Of the five personality tests, four 
did not have homogeneous variance. 

The tests differentiating significantly among 
the men’s curricular groups included the four 
cooperative achievement tests, two of the 
seven Primary Mental Abilities tests (verbal 
and reasoning), and four of the five occupa- 
tional scales studied on the Strong Vocational 
Interest Blank. The between-means_ vari- 
ances were significant at the .05 level for the 
two Primary Mental Abilities tests, at the .05 
level for two and at the .01 level for two of 
the achievement tests, and at the .01 level 
for all four of the vocational interest scales. 

In only one case of the women’s tests were 
the variances not homogeneous. Again, all 
four of the achievement tests differentiated 
among the curricular groups, three at the .01 
level and one at the .05 level, and three of 
the Primary Mental Abilities tests differenti- 
ated significantly (perception, verbal, and 
reasoning), the latter two being at the .05 
level and the first at the .01 level. Four of 
the six vocational interest scales on the wom- 
en’s blank differentiated significantly among 
the curricular groups at the .01 level. One 
of the scales, Librarian, failed to differentiate 
significantly, and the sixth scale, General Of- 
fice Worker, had nonhomogeneous variance. 

In summary, for both sexes high school 
rank provided variances that were not homo- 
geneous. With both sexes, the achievement 
tests differentiated among the groups as did 
the verbal and reasoning tests. The percep- 
tion test differentiated for women, but not for 
men. The vocational interest scales differen- 
tiated quite consistently .: 

These results, presented in terms of F 
values, allow comparisons to be made among 
the various tests. The F provides an indica- 
tion of the ratio of the within-groups vari- 
ance and the among-groups variance and the 
larger the F value, the more effectively the 
test differentiates among groups. 

The sizes of these F values suggest that 
the vocational interest scores were more ef- 
fective in differentiating among the curricular 
groups for both men and women than were 
any of the other tests, and the achievement 





108 


tests were more effective differentiators than 
were the aptitude tests. Only a few of the 
aptitude tests were effective as differentiators 
and the sizes of the attained F values were 
not large enough to be impressive. None of 
the personality inventory scores differenti- 
ated among the curricular groups. Thus, the 
tests can be ranked in this order according to 
the effectiveness with which they differenti- 
ated among these groups: interest tests, 
achievement tests, ability tests, personality 
tests. 

In order to help interpret the statistical re- 
sults, the extent of overlapping among ex- 
treme groups of men was determined for five 
different tests having different F values. The 
F value for the men on the memory score 
was less than one, indicating no statistically 
significant variance among the means. The 
highest and lowest mean scores on this test 
were obtained respectively by the men later 
obtaining degrees in medicine and those ob- 
taining degrees in dentistry. When these two 
groups were compared on the basis of this 
test, 88 per cent overlapping was found, or 
only 12 per cent of the total group consisting 
of 37 M.D.’s plus 21 D.D.S.’s had scores not 


falling within the range of the specific group 
to which the person did not belong. 

The F value for the social adjustment scale 
was 1.31, again a nonsignificant variance 


among means. The two extreme groups of 
men on this score were the engineers and 
chemists and the social science B.A.’s. The 
degree of overlapping between these two small 
groups was relatively small, 100 per cent of 
the engineers and chemists falling below the 
median score of the social science majors, 
with a total of 47 per cent overlapping. 

On the reasoning test, the F value was 
2.11, indicating a variance among the means 
significant at the .05 level. The two ex- 
treme groups on this test were the M.D.’s 
and the social studies B.S.’s. Of the social 
science majors, 88 per cent obtained reason- 
ing test scores below the median of the 
M.D.’s and the total amount of overlap was 
76 per cent. 

On the mathematics test, the F value was 
3.03, indicating a significant variance among 
means. The two extreme groups were the 


Ralph F. Berdie 


M.D.’s and the journalism majors and 83 per 
cent of the journalism majors obtained scores 
below the median score for the M.D.’s with 
a total overlap of 80 per cent. 

The score with the highest F value, 19.58, 
was the Physicians’ Interest scale and the ex- 
treme groups here were the M.D.’s and the 
business majors. Of the business majors, 97 
per cent obtained scores below the median 
score of the M.D.’s with a total overlap of 
66 per cent. Of the 64 business majors, 27 
obtained scores lower than any score ob- 
tained on this scale by the M.D.’s and 25 per- 
sons, or 67 per cent of the total group of 
M.D.’s, obtained scores higher than those ob- 
tained by 97 per cent of the business majors. 

Thus, comparisons indicated considerable 
overlapping among extreme groups even when 
the F value was highly significant, but never- 
theless the degree of differentiation among 
these groups was impressive. 

Many of the students in this study ob- 
tained counseling in the University during 
their college careers, and this counseling was 
based in part upon the scores of many of the 
tests used in the analysis. Thus, the results 
of the study could have been to some extent 
contaminated by the biases and preferences 
shown by counselors. For instance, if the 
counselors who worked with these students 
tended to place much more emphasis on the 
interest test than they placed upon the ability 
or achievement tests, we would expect the in- 
terest test to differentiate more among groups 
than the other tests. Fortunately, an esti- 
mate could be made of this contamination. 

Of the 64 men who received degrees in 
business, 41 had been counseled at the Stu- 
dent Counseling Bureau where the counselors 
had access to all of the test scores. Com- 
parisons of means and variances on all the 
tests were made between the counseled group 
and the noncounseled group. The mean dif- 
ferences were significant (at least at the .05 
level) on only the mathematics test, and the 
variances were significantly different only on 
high school percentile rank. Thus, the men 
who received degrees in business and who 
were counseled might have been influenced by 
this counseling, but they appeared to be no 
different on the basis of their test scores than 





A Longitudinal Comparison of Tests 


men with business degrees who were not 
counseled. 

A similar comparison was made between 
the men who received M.D.’s who were coun- 
seled and those who were not. The means of 


the 13 counseled men differed significantly 
from the means of the noncounseled only in 
the case of the ACE and the variances dif- 


Test 


High School #ile Rank 
American Council Test 
Coop. English Test 
Primary Mental Abilities 
Perception 
Numbers | 
Verbal 
Space 
Memory 
Induction 
Reasoning 
Personality Inventory 
I — Morale 
II — Social Adjustment 
III — Family Adjustment 
IV = Emotional Adjustment 
V — Economic Conservatism 
Coop. Achievement Test 
Mathematics 
Natural Science 
Social Sciences 


Strong Vocational Interest 
Blank 


Physician 

Dentist 

Engineer 

Social Science Teacher 
Lawyer 


-10 O 


109 


fered on the perception, spatial, and eco- 
nomic conservatism scores. 

The available evidence suggests that the 
extent to which the curricular groups were 
differentiated by the tests was not influenced 
by the varying effectiveness with which coun- 
selors used these tests. 

Some of the critical ratios estimating the 


BUSINESS 


N=64 


30 40 50 60 
Percentile Rank 


10 20 30 50 60 
Sigma Scores 








Occupational Level ! 
Masculinity-Femininity ! 








a wf. Wa ' t 





10 = 8=.20 


300 =6—40 50 60 70 80 
Sigma Scores 


Curricular profiles for business graduates 





Ralph 


Test 


High School #ile Rank 
American Council Test 
Coop. English Test 
Primary Mental Abilities 

Perception 

Numbers 

Verbal 

Space 

Memory 

Induction 

Reasoning 
Personality Inventory 

I — Morale 

II — Social Adjustment 
III ~ Family Adjustment 

IV — Emotional Adjustment 

V -— Economic Conservatism 
Coop. Achievement Test 

Mathematics 

Natural Science 

Social Sciences 


Strong Vocational Interest 
Blank 


Physician 

Dentist 

Engineer 

Social Science Teacher 
Lawyer 


F. Berdie 


MEDICINE 


=37 


* 
e 


40 50 60 
Percentile Rank 


20 30 70 8&0 


ct @ X 60 


Sigma Scores 


50 





1 1 1 





Occupational Level 
Masculinity-—Femininity 








ee a. 





Oo 10 


Fic. 2. 


significance of the difference between the 
mean scores of pairs of male groups provided 
the following results. On the natural science 
test, the means of the business and medical 
group were significantly different at the .01 
level of probability. On the social studies 
test, the social studies majors and dentistry 
graduates were significantly different with the 
level of probability being between the .01 
and the .05 level. On the verbal test, the 


20 


30 40 50 60 
Sigma Scores 


Curricular profiles for medical graduates. 


medical group and the dental group were sig- 
nificantly different at the .05 level and on 
this test the business group and medical 
group were significantly different at the .01 
level. The difference between the business 
group and journalism group was significant 
at the .05 level. The difference between the 
business group and the law group was not 
statistically significant. On the reasoning 
test, the difference between the medical and 





A Longitudinal Comparison of Tests 


social studies groups was statistically signifi- 
cant at the .01 level. The difference between 
the medical and education groups was sig- 
nificantly different at the .01 level. On the 
mathematics test, the difference between the 
medical and journalism groups was significant 
at the .01 level and on the social studies test, 
the difference between the social studies group 
and the business group was statistically sig- 
nificant between the .01 and the .05 level. 


Curricular Profiles 


The analysis allowed curricular profiles to 
be prepared. Samples of these profiles are 
shown in Fig. 1-4, two for men and two for 


111 


women. The solid lines attach the mean 
scores, expressed as percentile scores. The 
dotted lines show the percentile scores corre- 
sponding to scores one standard deviation 
above and below the mean for each test. 
Thus, the men’s profile for business gradu- 
ates shows that they had relatively superior 
achievement in high school, relatively good 
scores on the ACE, relatively poor scores on 
the English test, and about average scores 
for college freshmen on the Primary Mental 
Abilities tests with the exception of high 
scores on the numbers test, a test that did 
not significantly differentiate among the cur- 
ricular groups. The Strong profile showed 


MEDICAL TECHNOLOGY 


Test 


High School #ile Rank 
American Council Test 
Coop. English Test 
Primary Mental Abilities 
Perception 
Numbers 
Verbal 
Space 
Memory 
Induction 
Reasoning 
Personality Inventory 
I - Morale 
II -— Social Adjustment 
III — Family Adjustment 
IV — Emotional Adjustment 
V — Economic Conservatism 
Coop. Achievement Test 
Mathematics 
Natural Science 
Social Sciences 


Strong Vocational Interest 
Blank 


Nurse 
Physician 
Librarian 
English Teacher 
Social Science Teacher 
General Office Worker 
-20 -10 


Fic. 3. 


30 40 50 60 


Percentile Rank 


70 


~ 


30 40 
Sigma Scores 


So 20 2 50 


Curricular profiles for medical technology graduates. 





Ralph F. Berdie 
NURSING 


N= 35 


Test 


High School #ile Rank 
American Council Test 
Coop. English Test 
Primary Mental Abilities 

Perception 

Numbers 

Verbal 

Space 

Memory 

Induction 

Reasoning 
Personality Inventory 

I - Morale 

II — Social Adjustment 
III — Family Adjustment 

IV — Emotional Adjustment 

V -— Economic Conservatism 
Coop, Achievement Test 

Mathematics 

Natural Science 

Social Sciences 


Strong Vocetional Interest 
Blank 


Nurse 

Physician 

Librarian 

English Teacher 

Social Seience Teacher 
General Office Worker 


-20-10 0 


Fic. 4. 


mean scores characteristic of persons teach- 
ing social science. 

The profile for the medical graduates shows 
a consistently high set of scores with excep- 
tionally high indices based on high school 
percentile rank and the ACE, and notably 
high scores on all of the achievement tests, 
particularly the mathematics and the natu- 
ral science tests. The profiles for the nurs- 
ing graduates and medical technology gradu- 
ates are presented because of the general 
similarity of the interest profiles and the 
marked differences shown in the ability and 


50 70 


Percentile Rank 


10 20 30 40 §0 


Sigma Scores 


Curricular profiles for nursing graduates. 


achievement profiles. The nursing graduates 
tend to have mean scores falling rather close 
to the average for the entire group of enter- 
ing women, whereas the medical technology 
graduates tend to have high scores on both 
aptitude and achievement tests. In terms of 
mathematics and natural science, the women 
in medical technology compare favorably to 
the men who graduated from medical school. 
Curricular profiles of this sort should be of 
great value to counselors working with high 
school seniors and entering college freshmen. 





A ,Longitudinal Comparison of Tests 


Prediction of ‘Grades 


The effectiveness of these tests as predic- 
tors of grades for the entire group of enter- 
ing men was also studied and the results were 
not impressive. Attention was paid only to 
zero-order correlations.. Courses taken dur- 
ing the student’s entire University career 
were divided into these areas: verbal, social 
studies, mathematics, physical science, bio- 
logical science, and total grades. In general, 
high school percentile rank was among the 
best predictors both for total grades and for 
grades in each of the course areas. In the 
verbal courses, the English test was the best 
test predictor of grades, followed closely by 
the ACE. Some of the obtained correlations 
between tests and grades in various courses 
were: 


PMA verbal test and grades in verbal courses 

ACE and grades in verbal courses 

Social Science test and grades in social studies 
courses 

PMA verbal] test and grades in social studies 
courses 

Mathematics test and grades 
courses 

PMA reasoning test and grades in mathematics 
courses 

PMA numbers test and grades in mathematics 
courses 

Mathematics test and grades in physical science 
courses 

PMA induction test and grades in physical sci- 
ence courses 

English test and grades 
courses 

PMA verbal test and grades in biological science 
courses 

Natural Science test and grades in biological sci- 
ence courses 

Social Science test and grades in biological sci- 
ence courses 


in mathematics 


in biological science 


None of the Primary Mental Abilities tests 
had correlations higher than .20 with total 
grades. The correlations between achieve- 
ment tests and total grades were .30, .26, and 
33. 

Correlations also were computed between 
test scores and the number of credits com- 
pleted in course areas: 


English test and number credits in verbal courses .36 
Natural Science test and number credits in physi- 
cal sciences 36 


113 


Mathematics test and number credits in physical 
sciences .28 
Natural Science test and number credits in bio- 
logical sciences 35 
Mathematics test and number credits in biologi- 
cal sciences .26 


With social studies credits, none of the cor- 
relations exceeded .16, except the negative 
correlation of — .21 between the PMA space 
test and the number of credits taken in the 
social studies. None of the tests seemed to 
be significantly correlated with the number 
of credits taken in mathematics except for 
the mathematics test which correlated .18. 

Thus, the correlations between grades in 
course areas and test scores were all relatively 
low, but in the case of all the achievement 
tests, and in some cases of the Primary Men- 
tal Abilities tests, in the expected direction. 
The correlations between test scores and the 
number of credits taken in various course 
areas also were relatively low, but some sig- 
nificant correlations were found, indicating a 
tendency for those persons scoring high in 
tests to later take more college courses in 
related areas. 

Two additional analyses were possible with 
these data. First, comparisons of test scores 
were made between the 219 men who gradu- 
ated and the 335 men who did not and be- 
tween the 252 women who graduated and the 
295 women who did not. For the men, tests 
that differentiated significantly (at least the 
.05 level) between these two groups were the 
ACE, the Cooperative English test, the Co- 
operative Natural Science test, the Coopera- 
tive Mathematics test, the Cooperative Social 
Science test, the verbal test, the space test, 
and the morale score. For the women, the 
tests that differentiated between these two 
groups were the ACE, the four Cooperative 
Achievement tests, the verbal test, and the 
space test. Variances were significantly dif- 
ferent for the Social Studies test. 

Secondly, the group of 37 Arts College 
freshmen who received medical degrees was 
studied more intensively in order to deter- 
mine significant relationships between test 
scores and rank in their graduating class in 
medicine. Significant (at the .05 level) cor- 
relations with class rank were: first quarter 
honor-point ratio in premedicine, .40; social 





114 


adjustment score, .42; emotional adjustment 
score, .32; and numbers score, .31. 


Conclusions 


Tests given to a group of entering college 
freshmen bear a substantial relationship to 
the future careers of these college students 
as shown by the degrees they obtain from 
college. These tests can predict to a limited 
extent the grades students will obtain in 
various kinds of courses and to some extent 
the number of courses they will take in vari- 
ous areas. 

Curricular profiles can be constructed to 
allow the counselor and the student to com- 
pare the student’s test scores with the scores 
obtained by college freshmen eventually suc- 
ceeding in certain curricula. 

The results leave little question that vo- 
cational interest tests differentiate better 
among curricular groups than do other kinds 
of tests, and that the prediction of which 
curriculum a student will graduate from can 
be made better with an interest test than 
with either aptitude tests or achievement 
tests. 


Ralph F. 


Berdie 


For those training programs which are at 
the college levels, differential abilities do not 
appear to be very important when compared 
to differential interests. The evidence sug- 
gests that abilities cannot be disregarded, 
and that both tests of abilities and achieve- 
ment can make real contributions to the col- 
lege counseling program, but these results 
even more emphatically delineate the need 
for interest measurement in counseling. Dif- 
ferential educational and vocational distribu- 
tion at the college level, as shown by attain- 
ment of college degrees, is much more de- 
pendent upon motivations and interests than 
upon special abilities. 


Received March 30, 1954. 


Reference 


1. Darley, J. G. A study of the relationships among 
the Primary Mental Abilities Test, selected 
achievement measures, personality tests, and 
tests of vocational interests. Studies in Higher 
Education, Biennial Report of the Committee 
on Educational Research, 1938-40. Minne- 
apolis: Univer. of Minnesota, 1941. 





The Journal of Applied Psychology 
Vol. 39, No. 2, 1955 


Predicting Academic Achievement with the Differential 
Aptitude and the Primary Mental Abilities Tests 


William D. Wolking 


University of Minnesota? 


The trend in aptitude testing has been to- 
ward the special measurement of more or less 
specific abilities by a battery of tests which 
has been standardized as a group and which 
makes use of the profile scoring approach (2, 
pp. Al—A3; 11, chap. 14 and 15). The prin- 
ciple underlying this type of test battery is 
that each measurable aptitude is usable in 
a number of prediction problems; hence, a 
standard test battery can be constructed and 
normed in such a way as to yield scores for 
predicting a number of different criteria. 
Various methodological approaches have been 
used in the construction of such a test bat- 
tery. Two of these approaches have resulted 
in test batteries which have many similari- 
ties, at least superficially, and which have 
“commanded attention in the field of aptitude 
testing. The older of these is the Tests of 
Primary Mental Abilities (PMA) developed 
by Thurstone (12, 13) as a practical imple- 
mentation of his factorial studies of intelli- 
gence. The more recent battery is the Dif- 
ferential Aptitude Tests (DAT), constructed 
by Bennett, Seashore, and Wesman (2) 
through the use of the differential scores ap- 
proach (1, 9, 15). 

Because these batteries do have certain 
similarities of structure, content, and use; 
and because it is commonly assumed that 
both batteries are based on factorial research 
and measure similar factors in part (11, p. 
370), it was felt that a direct comparison of 
the scores of these two batteries would be 
useful. This study is primarily concerned 
with comparing the intercorrelations and the 
validities for predicting high school grades of 


1This paper is a condensation of a colloquium 
paper submitted to the Graduate Faculty of the 
University of Minnesota in partial fulfillment of the 


requirements of the M.A. degree. The author wishes 
to express his gratitude to Dr. Ralph Berdie for his 
encouragement and assistance. The author also ac- 
knowledges the whole-hearted cooperation of Mr. 
George Scott, Principal, and Miss Evangeline Mal- 
chow, Guidance Counselor of Central High School, 
La Crosse, Wisconsin. 


115 


the DAT and the PMA tests. While neither 
battery makes an attempt to measure the 
content of specific school subjects, the DAT 
is committed to “. . . employing tests each 
of which provides an educationally signifi- 
cant score” (2, p. A2). 

Three specific problems are to be investi- 
gated: (a) To what extent are the corre- 
sponding tests of these two batteries measur- 
ing the same aptitude? (5) Do the tests 
show the highest predictive value for those 
school subjects which are commonly believed 
to have the highest loadings of that factor? 
For example, do the verbal scores correlate 
most highly with English grades and the 
number scores with mathematics grades? 
(c) How do the corresponding tests of these 
two batteries compare in predicting academic 
achievement at the high school level? 


Subjects and Procedure 


The subjects (Ss) were 139 girls and 128 boys 
from the eleventh grade of Central High School, 
La Crosse, Wisconsin. This was 93 per cent of the 
total class—thirteen girls and seven boys being ab- 
sent the day of testing. The number of professional, 
semiprofessional, and managerial class children ap- 
pear in representative proportions; while rural and 
unskilled labor class children are slightly underrep- 
resented. The difference is made up by an over- 
representation of semiskilled labor class children. 
Age data were: boys, range 14-8 to 18-3, mean 16.43 
years; girls, range 15-0 to 17-11, mean 16.30 years 
The best evidence for the adequacy of the sample is 
the fact that the means and standard deviations of 
the subtests, taken separately by sex, are almost 
identical with the published norms for the DAT. 
And Bechtoldt (4, p. 677) says these norms are 

. . perhaps the most adequate special aptitude or 
ability test norms currently available.” 

Considerations of time and expense necessitated 
the restriction of this study to a comparison of the 
verbal, numerical, and spatial subtests of these two 
batteries. 

All tests were administered in a single half-day 
testing session in the fall of the year. A counter- 
balanced design was employed in an attempt to con- 
trol and evaluate the effects of practice and fatigue. 
The total sample was divided into two groups 
through the use of a table of random numbers (8, 





116 


’ 


Table 18). Group I consisted of 60 boys and 73 
girls who were given the PMA first and the DAT 
second. Group II consisted of 68 boys and 65 girls 
who were given the test batteries in the reverse 
order. 

Tests of significance were made of the mean ages 
and the means and standard deviations of the sub- 
tests, taken separately by sex, between these two 
randomly chosen groups. The difference between 
the standard deviations of the number test of the 
PMA battery (girls) resulted in a ¢t of 2.05 (signifi- 
cant at the 5 per cent level of confidence) and was 
the only significant difference found. Thus, the 
randomization and counterbalanced design appear to 
have been effective. 

Form A of the verbal reasoning, numerical ability, 
and space relations tests of the DAT battery were 
used. The Single Booklet Edition of the Chicago 
Tests of Primary Mental Abilities, ages 11 to 17, 
was administered in its entirety. Standard instruc- 
tions were adhered to in both cases. The tests were 
administered by qualified teachers as part of the 
regular testing program of the school. Testing con- 
ditions were satisfactory throughout. 

School grades from the previous years work were 
recorded for all Ss. Thus, all correlations between 
grades and test scores reported here are actually 
postdictions. Only English, algebra, geometry, in- 
dustrial arts, home economics, and science grades 
were used in the analysis. History, social studies, 
foreign languages, etc. were not taken by sufficient 
numbers of Ss to warrant the use of product mo- 
ment correlations, which were used exclusively in 
this study. 

The main statistical analysis consisted of: (a) 
computing the correlations between the. scores of 
subtests supposedly measuring the same ability, ie., 
verbal reasoning of each battery; (b) computing the 
correlations between the scores of all tests and grades 
in the five subjects mentioned above; and (c) test- 
ing the significance of the difference between correla- 
tions of the comparable subtests and the same school 
subjects, ie., the difference between the correlation 
of PMA verbal scores vs. English grades, and the 
DAT verbal scores vs. English grades correlation. 
This was done by converting the r’s to 2’s as pre- 
scribed by Fisher and then computing the signifi- 
cance of the difference between z’s based on the 
same sample, with a common variable. All correla- 
tions were corrected for coarse grouping as recom- 
mended by Guilford (7, p. 360). In no case was the 
coefficient raised by more than .03 due to this cor- 
rection. No r’s were based on an N of less than 73, 
while the average N was approximately 125. 


Results 


The mean DAT test scores for boys and 
girls in this sample agree closely with the 
published norms. Agreement of the mean 
PMA test scores with their norms is poor, 
especially for the number and spatial tests. 
It is apparent that separate norms for each 
sex are needed for the PMA tests. 

Table 1 presents the correlations between 


William D. Wolking 


Table 1 


Correlations Between DAT and PMA Subtest Scores 
for 266 11th Grade Children 








Correlations 





Boys Girls Total 


Tests 
DAT Verbal vs. 
PMA Verbal b 
DAT Number vs. 
PMA Number AT 
DAT Space vs. 
PMA Space 58 


y 
(N=128) (V=138) (N=266) 








NoTE.—All correlations are significantly different from zero 
at the 1 per cent level. 


the scores of the corresponding subtests of 
the DAT and PMA. Tables 2 and 3 contain 
the validity coefficients for high school grades 
and the tests of significance of the difference 
between pairs of these coefficients. Of the 
sixty correlations reported in Tables 2 and 3 
only five fail to be significantly different from 
zero. Four of these involve industrial arts 
grades as one variable. All of the correla- 
tions for girls are significant at the 5 per cent 
level or beyond. 

The data in Table 1 are consistent for 
boys, girls, and the combined group in show- 
ing the two verbal tests to be most highly 
correlated, the spatial tests to be next, and 
the number tests to be only moderately cor- 
related. Since the number and spatial test 
of both batteries have reliabilities fully as 
high or higher than the verbal tests, the order 
of these relationships does not appear to be a 
function of the reliabilities of the tests. A 
similar study of 66 boys and 87 girls con- 
ducted in Iowa (2, pp. E86—E87) shows the 
same pattern of relationships although the 
correlations are lower. Here the lower cor- 
relations are probably explained by the fact 
that in an eighth grade group the reliabilities 
of these tests are lower. 

The data bearing on whether the subtests 
are most valid for predicting grades of school 
subjects which are assumed to require the 
ability measured by that particular test are 
surprising in some cases. Regardless of sex, 
all tests are most valid for predicting science 
grades. If the correlations for each subtest 
in Tables 2 and 3 are ranked by size, 10 out 
of 12 first ranks go to science. Industrial 
arts and home economics grades are not pre- 





Predicting Academic Achievement 


dicted well by any of the tests, with 9 of 12 
lowest ranks going to these subjects. The 
number test of the DAT battery is the best 
over-all predictor of academic success. Its 
correlations with all subjects except indus- 
trial arts and home economics are .55 or 
higher. The DAT verbal test is the second 
best over-all predictor, and the PMA verbal 
test is the third best general predictor of aca- 
demic success. Both number tests predict 
English grades as well or better than either 
of the verbal tests. None of the tests appear 
to be conspicuously superior at predicting 
grades in courses generally assumed to re- 
quire the ability measured by that particular 
test for success, although there is a tendency 
for spatial tests to have higher validities for 
science, geometry, and algebra, whereas the 
verbal tests have higher validities in general 
than the spatial and are relatively more valid 
predictors of English grades. 

Generally the tests appear to function quite 
similarly for the sexes as far as the level 
and ranks of the correlations are concerned. 


Table 2 


Correlations of the DAT and PMA Test Scores with 
School Grades and Significance Tests of the Differ- 
ence Between the Correlations of Corre- 
sponding Subtests with Particular 
Grades for 11th Grade Boys 





DAT PMA 


School 
Subjects N r r 


Verbal Tests 
English iz”: la lO a= 
Algebra ma AY 6 | 6a DAT 
Geometry ja to” 06 3B DAT 
Industrial Arts 115 .05 11 
Science ma 2.3 


DAT 





Number Tests 


- DAT 
sof" DAT 
245° . DAT 
233°*. DAT 
3.86** DAT 


3s**  47** 
32** 
62** —30* 
— 06 
.46** 


English 128 
Algebra 126 
Geometry 73 
Industrial Arts 115  .18* 


.69** 


-_ 


Science 127 
Spatial Tests 
English 
Algebra 
Geometry 


a OT 2.58°* DAT 
or ae Sa DAT 
a" ae 1.18 DAT 
—_— Coa 2.84** DAT 
a «ac wae”. DAT 


Industrial Arts 
Science 





* Significant at 5 per cent level of confidence. 
** Significant at 1 per cent level of confidence. 


Table 3 


Correlations of the DAT and PMA Test Scores with 
School Grades and Significance Tests of the Differ- 
ence Between the Correlations of Corre- 
sponding Subtests with Particular 
Grades for 11th Grade Girls 








DAT PMA 
Schiool 
Subjects N r r CR 


Verbal Tests 





English in £2" 2 
Algebra is } A 6hCA 
Geometry Ws) 62 
so)6 | ae" 
66* 66" 





Home Economics 132 
Science 137 








Number Tests 
_58** 44** 
a” te 
CT? 343” 433° DAT 
i 3a 2” BAT 
isa a AB" 798" DAT 








English 139 
Algebra 138 
Geometry 77 
Home Economics 132 
Science 


rE yg 
2.10* 


DAT 
DAT 





Spatial Tests 
English a ww 2 
Algebra iss . 2o” tse 
” a6 Ce 
36" 
AT 





1.39 DAT 
ar — — 


45** - — 


Geometry 
Home Economics 132 
3 137 


Science 


* Significant at 5 per cent level of confidence. 
** Significant at 1 per cent level of confidence. 


There is a slight but consistent tendency for 
all tests to show higher validities for the girls. 
The critical ratios in Tables 2 and 3 give 
evidence that the DAT battery has higher 
validity in every case where there is a sig- 
nificant difference between the validities of 
the subtests of the two batteries. For the 
boys, 9 of the 15 possible differences are sig- 
nificant at the 5 per cent level or better. 
The number and the spatial tests of the DAT 
battery do a significantly better job of predic- 
tion in general, while the two verbal tests are 
quite comparable. For the girls, there are 
only 5 differences significant at the 5 per cent 
level or better. All 5 of these are between 
the number tests. The verbal or spatial tests 
of neither battery proved to be significantly 
superior as a predictor for girls. A tally of 
any difference of .05 or larger between the 
validities of corresponding subtests shows all 
21 of the differences favoring the DAT. 
The degree of correlation expressed in the 
validity coefficients found in this study is in 
general agreement with those found in other 





118 


studies using these two batteries on separate 
groups (2, pp. E13—E83; 3, 5, 6, 10, 14, 16). 
Our findings are also in agreement with these 
studies in showing that the DAT number test 
tends to be as good or better than the DAT 
verbal test as an over-all predictor of aca- 
demic achievement in this battery, while in 
the PMA battery the verbal test is the best 
over-all predictor. 

The moderate to substantial correlations of 
the verbal and number tests of both batteries 
with all subjects except industrial arts and 
home economics demonstrates some potential 
for predicting academic success in general, 
but throws doubt on the immediate useful- 
ness of the various subtests as differential 
predictors for various subject matters. This 
does not preclude the possibility that weight- 
ing of the various subtests in a multiple re- 
gression equation would not result in a bet- 
ter prediction of over-all academic achieve- 
ment than is obtained from single score tests. 
There is some evidence to suggest this may 
be the case in the work of Shanner and Kuder 
(10) and Edgerton and Ellison (5). 


Summary and Conclusions 


A comparison was made between the use- 
fulness of the DAT and the PMA test bat- 
teries as predictors of academic achievement 
in high school. The Ss were 139 girls and 
128 boys in the eleventh grade of a small, 
city high school. A counterbalanced design 
was effective in nullifying practice effects. 
The main conclusions are: 

1. The corresponding subtests of the two 
batteries show moderate to substantial inter- 
correlations. The verbal tests are measuring 
a similar aptitude and the spatial tests show 
moderate intercorrelations, whereas the num- 
ber tests show considerable independence. 

2. The tests do not generally predict best 
in the subject usually assumed to be meas- 
ured by that test. All tests show their great- 
est effectiveness in the prediction of science, 
geometry, and algebra grades. 

3. The DAT shows higher validities in gen- 
eral than the PMA. The DAT number test 
is significantly superior to the PMA number 
test as a predictor of high school grades for 
both sexes. 

In concluding it should be pointed out that 
school grades are only one criteria by which 


William D. Wolking 


to judge the usefulness of these two batteries. 
As mentioned earlier, these batteries were de- 
signed for the prediction of multiple criteria. 
They may also be evaluated as vocational 
aptitude tests using vocational criteria other 
than school grades. Work is progressing in 
this area. 


Received March 23, 1954. 


References 


. Bennett, G. K., & Doppelt, J. E. The evalua- 
tion of pairs of tests for guidance use. Educ. 
psychol. Measmt, 1948, 8, 319-325. 

. Bennett, G. K., Seashore, H. G., & Wesman, 
A. G. Differential Aptitude Test manual. 
New York: Psychological Corporation, 1947. 

. Bernreuter, R. C., & Goodman, C. H. A study 
of the Thurstone PMA tests applied to fresh- 
man engineering students. J. educ. Psychol., 
1941, 32, 55-60. 

. Buros, O. K. (Ed.) Fourth mental measure- 
ments yearbook. Highland Park, N. J.: Gry- 
phon Press, 1953. 

. Edgerton, H. A., & Ellison, M. L. The Thur- 
stone PMA tests and college marks. Educ. 
psychol. Measmt, 1941, 1, 399-406. 

. Goodman, C. H. Prediction of college success 
by means of Thurstone’s PMA tests. Educ. 
psychol. Measmt, 1944, 4, 125-140. 

. Guilford, J. P. Fundamental statistics in psy- 
chology and education. 2nd Ed.) New 
York: McGraw-Hill, 1950. 

. Lindquist, E. F. Statistical analysis in educa- 
tional research. Boston: Houghton Mifflin, 
1940. 

. Segal, D. Differential diagnosis of ability in 
school children. Baltimore: Warwick & York, 
1934. 

. Shanner, W. M., & Kuder, G. F. A compara- 
tive study of freshman week tests given at 
the University of Chicago. Educ. psychol. 
Measmt, 1941, 1, 85-92. 

. Super, D. E. Appraising vocational fitness by 
means of psychological tests. New York: 
Harper, 1949. 

. Thurstone, L. L. Primary mental 
Psychometric Monogr., 1938, No. 1. 

. Thurstone, L. L., & Thurstone, T. G. Chicago 
tests of primary mental abilities; manual of 
instructions. Chicago: Science Research As- 
soc., 1943. 

. Tredick, V. D. The Thurstone PMA test and 
a battery of vocational guidance tests as pre- 
dictors of academic success. Unpublished 
doctor’s dissertation, Penn. State Coll., 1941. 

. Wesman, A. G., & Bennett, G. K. Problems of 
differential prediction. Educ. psychol. Measmt, 
1951, 11, 265-272. 

. Yum, K. S. Primary mental abilities and scho- 
lastic achievement in the divisional studies of 
the University of Chicago. J. appl. Psychol., 
1941, 25, 712-720. 


abilities. 





The Journal of Applied Psychology 
Vol. 39, No. 2, 1955 


An Analysis of the Predictive Value of the Pre-Engineering 
Ability Test * 


Frank Q. Sessions 


University of Idaho * 


Currently, the engineering candidates at 
the University of Idaho must complete a bat- 
tery of tests as part of their entrance require- 
ments. Upon the basis of these tests, the 
candidates are either encouraged or discour- 
aged to enroll in the College of Engineering. 
The efficacy of the battery, however, is not 
known, primarily because one of the tests, 
the Pre-Engineering Ability Test, was ad- 
ministered for the first time in the fall of 
1953. For this reason, it was the purpose of 
this study to try to ascertain the effective- 
ness of a battery of tests in the selection and 
counseling of beginning engineering students. 
Specifically, the questions which were pro- 
posed for investigation were as follows: 

1. Does the Pre-Engineering Test make 
any significant contribution to the tests al- 
ready administered in the general current 
testing program? 

2. Do the profile and the scattergram tech- 
niques reveal any meaningful relationships not 
shown by the correlation analysis? 

3. Does an analysis of the results warrant 
the continued use of the battery in part or in 
its entirety? 

The historical background. The Pre-Engi- 
neering Ability Test was derived from the 
Pre-Engineering Inventory. When composed 
of seven tests, the Pre-Engineering Inven- 
tory yielded multiple-correlation coefficients 
of over .65 with engineering grades in a num- 
ber of studies (5, 8,9). Studies by Jex (5) 
and others (6, 8, 9), however, revealed that 
the use of two of the seven subtests of the 
Inventory was practically as effective as the 
use of all seven variables. For this reason, 
and because the time and cost of administer- 
ing the battery was prohibitive, the most re- 
cent revision of the Inventory has excluded 
all of the tests but two: the General Mathe- 
matical Ability Test and the Ability to Com- 


1The assistance of Dr. William H. Boyer is 
greatly appreciated. 
2 Now at the University of Utah. 


prehend Scientific Material Test. Less than 
half of the items of these two subtests were 
retained to form the Pre-Engineering Ability 
Test. 

Method. The subjects were 148 engineer- 
ing candidates who took the tests at the Stu- 
dent Counseling Center in the fall of 1943. 
The procedure was to compare the first se- 
mester’s grade-point average of each student 
with the scores he received on the tests. The 
data were analyzed in terms of the correla- 
tions, profiles, and scattergrams. In addition 
to the two parts of the Pre-Engineering Abil- 
ity Test, the Cooperative General Achieve- 
ment Mathematics Test and both parts of the 
American Council on Education Psychologi- 
cal Examination were included in this study. 


Interpretation and Discussion 


Correlational analysis. It will be noted 
from Table 1 that when the Doolittle method 
was employed, a multiple-correlation coeffi- 
cient of .60 was attained between the 
weighted combination of tests and the first 
semester grade-point average. It is shown 
in Table 2 that the highest zero-order cor- 
relation was attained with the Cooperative 
Mathematics test with .56. The difference 
between the multiple-correlation coefficient of 
.60 and the zero-order correlation coefficient 
of .56 for the Cooperative Mathematics test 
was not significant. In Table 1, it is seen 
that the standard error of the correlation for 
the entire battery was .05; by chance alone, 
under similar circumstances, two-thirds of all 
the succeeding coefficients would be expected 
to fall between .55 and .65. Thus, if corre- 
lations were the only consideration, it would 
be almost as effective and much more eco- 
nomical to use the Cooperative Mathematics 
test alone in preference to the entire battery. 

As shown in Table 2, the ACE—Q test 
correlated .45 with GPA while the ACE—L 
had a coefficient of only .42. In view of 
previous studies (1, 2, 3, 4, 7, 10), it was to 





F. Q. Sessions 


Table 1 


Combinations of Tests That Have the 
Highest Correlations 








Standard 
Error of 
Correlation 


Grades 
with 
Tests 

Ro.12345) .60 .053 

Rois) 57 055 

Ross) j 055 

T 0.3) . .056 


Correla- 
tions 








be expected that the ACE would correlate 
only slightly with semester grades. 

Table 2 also shows that the Pre-Engineer- 
ing Science test correlated .45 and the Pre- 
Engineering Mathematics test correlated .49 
with grade-point averages. It is somewhat 
surprising to discover that the two parts of 
the Pre-Engineering Ability Test did not 
show higher validity coefficients. Jex (5), 
at the University of Utah, reported a .60 for 
the mathematics part of the old Pre-Engi- 
neering Inventory and .55 for the science 
comprehension part. He used first-quarter 
grades as his validity criterion. Johnson (6) 
and others (8, 9) reported results almost 
identical to those of Jex. 

One reason for this difference may be found 
in the reduction of items of the Pre-Engineer- 
ing Ability Test. All the items on the new 
Pre-Engineering Ability Test were selected 
from the old Pre-Engineering Inventory, but 
the number of items on the former has been 
cut to less than half of the original. It is 
probable, therefore, that the reduction of 
items per se has contributed to the diminu- 
tion of the validity coefficient of the Pre-En- 
gineering Ability Test. Also, a cursory check 


of the distribution of scores (Table 2) indi- 
cates that there may be a loss of sensitivity. 

From Table 3, it will be noted that the Co- 
operative Mathematics test yielded by far the 
highest partial regression coefficient. The 
beta coefficient was .3718; the ACE—L test 
was second in magnitude with .1507. Appar- 
ently, both parts of the Pre-Engineering Abil- 
ity Test and the ACE—Q contributed but 
little to the prediction of success in engineer- 
ing. It is not surprising to learn that this re- 
lationship exists, because the intercorrelations 
were high. Table 2 reveals that the correla- 
tion between the Cooperative Mathematics 
test and the Pre-Engineering Mathematics 
test was especially high. Pierson and Jex 
(9) also reported the high correlation of .76 
between the Cooperative Mathematics test 
and the Pre-Engineering Mathematics test. 
It is obvious that the Cooperative Mathe- 
matics test measured much of what the Pre- 
Engineering Mathematics test purports to 
measure. 

Vaughn (13, p. 579) stated that unless a 
test correlates at least .50 with the validating 
criterion, it has little guidance possibilities. 
In Table 2 it is seen that the Cooperative 
Mathematics test was the only variable to 
yield a correlation coefficient in excess of .50. 

Profile analysis. The following qualitative 
analysis of the data shows that Vaughn’s 
assumption (i.e., low validity correlations 
necessarily invalidate a test) is far from 
correct. These findings seem to support Slay- 
maker’s (11) suggestion that more meaning- 
ful results can be obtained from analyzing 
profile patterns than from correlations. In 
Fig. 1, the profiles show little difference be- 
tween the average of the passing group and 
the average of the failing group when the 


Table 2 


Intercorrelations, Means, and Standard Deviations 


Variable (0) (1) (2) (3) (4) (5) 


(0) Grade-point average A5 42 t 45 49 
(1) ACE—Q 56 65 BY | 66 
(2) ACE—L A2 we 53 
(3) Cooperative Math. 4 73 
(4) Pre-Engineering—Science 61 
(5) Pre-Engineering—Math. 





Mean 


2.33 
43.87 
63.26 
61.90 
17.92 
14.73 











Pre-Engineering Ability Test 


Table 3 


Partial Regression Coefficients 





Test Weights 


Beta No. 

B, 0.0329 
ACE—L Be 0.1507 
Cooperative Math. B; 0.3718 
Pre-Engineering—Science By 0.0802 
Pre-Engineering—Math. Bs 0.0675 








Pre-Engineering Ability Test was used. The 
same graph shows that the Cooperative 
Mathematics test differentiated between those 
who failed and those who passed by 0.12 
standard units, while the ACE test differen- 
tiated by 0.38 standard unit. These mean- 
ingful data were obtained with only those 
students who attained a score of less than 
— 0.60 standard unit, or raw scores of 91 on 
the ACE test, 58 on the Cooperative Mathe- 
matics test, and 24 on the Pre-Engineering 
Ability Test. Consequently, only the lower 
range of the distribution of scores was in- 
cluded. As would be surmised from the cor- 
relation coefficients, these differential rela- 








Fic. 1. Profile of the passing and the failing 
group for those who attained less than — 0.62 2 
scores on the tests. 


tionships were not obtained when the entire 
range of scores was employed. 

Figure 1 also shows that those students 
who failed attained an average standard score 
of — .962 or a raw score of 85 on the ACE 
test. Using 85 as the critical score, then, it 
was found that only two of 16 passed who 
attained a score of less than 85 on the ACE 
test. Even though the ACE test had a com- 
paratively low correlation with success in the 
College of Engineering, it provided the best 
test for determining critical scores. 





275 - 3.99 





350 - 3.74 





325 - 3.49 





3.00 - 3.24 





2.75 - 2.99 





290 ~ 2.74 





2.25 -~249 





2.00- 2.24 


Cn] |G || Py] — | IN 





1.75 - 1.99 





<I 1.50-1.74 





"125-149 
at 5 





1.OO- 1.24 





() .75- .99 





we. .f% 





25- .401. | 








O= £4 















































64 69 





60 65 70 75 80 85 90 95 100 105 |i@ 115 120 125 130 
74 79 84 89 9499 


TEST SCORES 
*A SCORE OF 85 =-0.962 Z-SCORES 


104109 114 119 124 129 154 








Fro. 2. 


Scattergram of ACE test scores and grade-point average.* 





122 


The suggestion that the ACE test provided 
a more useful critical score than the Coopera- 
tive Mathematics test was further substan- 
tiated by the records of the engineering can- 
didates who took the tests in the fall of 1950. 
After three and a half years, only one of the 
surviving 29 students received a score of less 
than 90 on the ACE test; on the other hand, 
there were four who attained less than a com- 
parable score of 59 on the Cooperative 
Mathematics test. 

Scattergram analysis. The outcome of the 
use of the profile technique was verified by 
the use of Thurstone’s scattergram method. 
As early as 1919, Thurstone (12) advised 
that critical scores obtained from scatter- 
grams should be used in preference to corre- 
lation coefficients. He pointed out that the 
critical score technique was more useful for 
practical prognostic purposes than the multi- 
ple correlation method because of the ease of 
application and because it differentiated be- 
tween those who failed or passed, though it 
did not predict actual grades. Jex (5) and 


Slaymaker (11) independently cited evidence 
that supported Thurstone’s observations. 
When scattergrams were utilized, not only 


did the ACE test serve as the most effective 
single test for determining critical scores, but 
it also served better than the composite scores 
of the battery or any combination thereof. 
In Fig. 2, it is shown that an individual who 
scored less than 85 on the ACE test had only 
a 13 per cent chance of succeeding in the en- 
gineering curriculum. 


Conclusions 


1. The multiple-correlation coefficient of 
.60 was not significantly higher than the best 
zero-order coefficient of .56, which was at- 
tained with the Cooperative General Achieve- 
ment Mathematics Test. 

2. Profile patterns, as well as scattergrams, 
revealed that even though the ACE test cor- 
related lower than the other tests with engi- 
neering grades, it served as the best test for 
determining critical scores. 


F. Q. Sessions 


3. Both the beta weights and the profile 
scores supported the fact that the Pre-Engi- 
neering Ability Test has little selective value 
compared with other tests in common use. 


Received May 14, 1954. 


References 


. Berdie, R. F., Dressel, P., & Kelso, P. Relative 
validity of the Q and L scores of the ACE 
Psychological Examination. Educ. psychol. 
Measmt, 1951, 11, 803-812. 

. Berdie, R. F., & Sutter, Nancy A. Predicting 
success of engineering students. J. educ. Psy- 
chol., 1950, 41, 184-190. 

. Coleman, W. An economical test battery for 
predicting freshman engineering course grades. 
J. appl. Psychol., 1953, 37, 465-467. 

. Cooprider, H. A., & Laslett, H. R. Predictive 
values of the Sanford Scientific and the En- 
gineering and Physical Science Aptitude tests. 
Educ. psychol. Measmt, 1948, 8, 683-687. 

. Jex, F. B. The predictive value of the Pre-En- 
neering Inventory. Unpublished master’s the- 
sis, Univer. of Utah, 1947. 

. Johnson, A. P. College Board Mathematical 
Test and the Pre-Engineering Inventory pre- 
dict scholastic success in colleges of engineer- 
ing. Amer. Psychologist, 1950, 5, 353. (Ab- 
stract) 

. Laycock, S. R., & Hutcheon, N. B. A prelimi- 
nary investigation into the problem of measur- 
ing engineering aptitude. J. educ. Psychol., 
1939, 30, 280-288. 

. Lord, F., Cowles, J. T., & Cynamon, M. The 
Pre-Engineering Inventory as a predictor of 
success in engineering colleges. J. appl. Psy- 
chol., 1950, 34, 30-39. 

. Pierson, G. A., & Jex, F. B. Using the Coopera- 
tive General Achievement tests to predict suc- 
cess in engineering. Educ. psychol. Measmt, 
1951, 11, 397-402. 

. Remmers, H. H., Elliott, D. N., & Gage, N. L. 
Curricular differences in predicting scholastic 
achievement: applications to counseling. J. 
educ. Psychol., 1949, 40, 385-394. 

. Slaymaker, R. R. Admission test procedure. J. 
engng Educ., 1946-1947, 37, 402-413. 

. Thurstone, L. L. Mental tests for college en- 
trance. J. educ. Psychol., 1919, 10, 129-142. 

. Vaughn, K. W. The Yale Scholastic Aptitude 
tests as predictors of success in the college of 
engineering. J. engng Educ., 1943-1944, 34, 
572-582. 





The Journal of Applied Psychology 
Vol. 39, No. 2 5 


2, 1955 


The Relationship Between Minnesota Teacher Attitude In- 
ventory Scores and Scores on Certain Scales of the 
Minnesota Multiphasic Personality Inventory * 


Walter W. Cook 


University of Minnesota 


and Donald M. Medley 


Indiana University 


Despite limited evidence as to its validity 
for such purposes, the Minnesota Multiphasic 
Personality Inventory (5) has been used 
rather extensively as an instrument in coun- 
seling college students. The present study is 
the second of a series designed to shed some 
light on the usefulness of the MMPI in coun- 
seling students interested in becoming class- 
room teachers. 

The Minnesota Teacher Attitude Inventory 
(MTAI) is an empirically constructed scale 
whose validity for predicting the ability of 
classroom teachers to establish and maintain 
harmonious relationships with their pupils 
has been well established (2). When the 
MTAI was standardized, it was administered 
to a stratified random sample of all of the 
classroom teachers in the state oi Minnesota 
(1). After the majority of the returns were 
in, the Minnesota Multiphasic Personality 
Inventory (MMPI) was administered to ap- 
proximately 200 teachers representing the 
highest and lowest 8 per cent in terms of 
MTAI scores. In a previous study two new 
MMPI scales were developed for use in coun- 
seling (3). In the present study the rela- 
tionships between scores on the “clinical” 
MMPI scales and MTATI scores will be ana- 
lyzed to see whether any specific suggestions 
can be made for counselors attempting to in- 
terpret MMPI profiles of college students in- 
terested in becoming teachers. 


Procedure 


In the standardization of the MTATI, inventories 
were mailed to approximately 2,000 Minnesota 
teachers. When 1,762 of the teachers contacted had 


* This study was made possible by a grant from 
the research funds of the Graduate School of the 
University of Minnesota. 


123 


returned usable MTAI’s, copies of the MMPI were 
mailed out to the 144 (8.2 per cent) scoring highest, 
and the 147 (8.3 per cent) scoring lowest. Usable 
MMPI answer sheets were returned by 112 (86.8 
per cent) of the high-scoring group and 100 (75.8 
per cent) of the low-scoring group. There were 19 
males and 93 females among the “highs” and 26 
males and 74 females among the “lows.” These two 
groups are the “high” and the “low” groups with 
which the present study is concerned. 

The 212 MMPI’s were scored on the following 
keys: K, F, L, ?, Hs (hypochondriasis), D (depres- 
sion), Hy (hysteria), Pd (psychopathic deviate), 
Mf (masculinity-femininity), Pa (paranoid), Pt (psy- 
chasthenia), Sc (schizophrenia), Ma (hypomania), 
and Si (social introversion). Three of the inven- 
tories (all “highs”) had significant elevations on the 
K scale. These three cases were omitted in those 
portions of the study involving the K correction, 
and included in the other portions. 

The analysis of the data involved three steps: 


1. Mean T scores for both sexes and both levels 
on each of the scales were computed for the 209 
teachers with K scores below 70, and the differ- 
ences tested for each sex separately with an ordi- 
nary ¢ test. 

2. A highly significant difference on the K scale 
was observed in step 1. Since this scale is used as 
a correction factor for five of the other keys (Hs, 
Pd, Pt, Sc, and Ma), a separate analysis of the raw 
scores (before the K correction was applied) on 
these five keys was made. Wiener (7) has devel- 
oped two separate keys each for the D, Hy, Pd, Pa, 
and Ma scales, one of each pair containing “subtle” 
items and the other “obvious” ones. There is some 
evidence that the subtle keys are more valid among 
normal adults than are the obvious ones. An analy- 
sis of the scores of the 212 teachers on these ten 
keys was also made. Since the results of step 2 also 
suggested the possibility of interaction between sex 
and rapport, the second step employed a two-way 
analysis of variance design, with sex and level of 
rapport as measured by the MTAI as the two bases 
of classification. 

3. The profiles of the 212 teachers were plotted 
and coded according to Hathaway’s system (4), and 
then sorted to see whether codings were related to 
MTATI scores. 





Walter W. Cook and Donald M. Medley 


Table 1 


Mean T Scores on Certain MMPI Scales of 209 Teachers, Classified by Sex and by Level of 
Pupil-Teacher Rapport as Measured by the MTAI 








Males Females 








High 
Rapport 
(N=18) 

63.8 

51.8 

53.2 

55.7 

56.1 

62.6 

53.7 

52.9 

54.6 

52.2 

44.6 


Differences t Ratios 


Males 


Low 
Rapport 
(V=73) 

53.6 

49.9 

$2.3 

53.2 

52.4 

51.1 

54.1 

51.8 

51.5 

51.4 


55.8 


Low 
Rapport 
(N=26) 

49.8 

51.3 

53.7 

53.1 

49.8 

60.6 

50.5 

51.8 

50.3 

55.5 

50.6 


High 
Rapport 
(NV=92) 


Males Females 
5.52* j 
0.20 
0.18 
1.52 
2.51 
0.58 
1.12 
0.36 
1.32 
1.04 
1.70 


Scale 
K 
Hs 
D 


Females 


+7.3 
+0.5 
—3.3 
+2.5 
+1.1 
—6.2 
—1.3 
—19 
+0.2 
—0.3 


—5.8 





Pd 


b++++44 





* Difference significant at 1% level. 
Note.—A high score on the MTAI indicates high rapport with pupils and a high score on the MMPI scale indicates a high 
degree of the trait measured by that scale. 


Table 2 


Analyses of Variance of Scores on Certain MMPI Scales of 212 Minnesota Teachers Classified According to Sex 
and According to Level of Pupil-Teacher Rapport as Measured by the MTAI 











F Ratios 


Sums of Squares Mean Squares 





Scale 


Rap- 
port 


Inter- 
action 


Error 


Total 


Rap- 
port 


Inter- 
action 


Error 


Rap- 
port 


Inter- 
action 





Hypochondria 


Symptomatic depression 


Subtle 
Obvious 
Hysteria 
Subtle 
Obvious 


Psychopathic deviation 


Subtle 
Obvious 
Paranoia 
Subtle 
Obvious 
Psychasthenia 
Schizophrenia 
Hypomania 
Subtle 
Obvious 


sis 


Degrees of freedom 


183.0 
114.7 
87.6 
402.7 
109.8 
623.6 
210.1 
25.3 
98.9 
224.3 
0.3 
127.1 
140.6 
1,198.7 
405.6 
46.0 
10.0 3.6 
99.0 5.6 
1 1 


41.7 
38.2 
22.3 
2.1 
1.3 
13.1 
6.2 
1.6 
3.7 
0.1 
24.9 
35.8 
1.0 
47.0 
77.6 
18.2 


2,057.6 
3,433.7 
1,298.3 
2,460.9 
3,023.7 
2,963.1 
1,829.7 
3,082.4 
1,221.1 
1,927.0 
2,002.0 
1,031.8 
913.6 
7,394.0 
6,425. 1 
2,754.7 
1,305.6 
973.1 
208** 


2,893.3 
3,628.0 
1,437.0 
2,866.8 
3,373.2 
3,780.7 
2,050.0 
3,112.9 
1,332.9 
2,175.7 
2,030.6 
1,197.5 
1,055.2 
8,650.0 
6,985.0 
2,958.1 
1,341.8 
1,127.3 

211** 


183.0 
114.7 
87.6 
402.7 
109.8 
623.6 
210.1 
25.3 
98.9 
224.3 
0.3 
127.1 
140.6 
1,198.7 
405.6 
46.0 
10.0 
99.0 


41.7 
38.2 
22.3 
2.1 
1.3 
13.1 
6.2 
1.6 
3.7 
0.1 
24.9 
35.8 
1.0 
47.0 
77.6 
18.2 
3.6 
5.6 


12.9 
16.5 
6.2 
11.8 
14.5 
14.2 
8.8 
14.8 
5.9 
9.3 
9.6 
5.0 
4.4 
35.9 
31.2 
13.2 
6.3 
4.7 


14.19* 3.23 
6.95* 2.31 
14.03* 3.91 
34.04* 
1sF 
43.77* 
23.88* 
1.71 
16.84* 
24.21* 


32.01* 
33.40* 
13.00* 
3.47 
1.60 
25" 





* Significant at 1% level. 
** Degrees of freedom for Hy, Pt, and Sc are 206 and 209 for error and total, respectively. 





The Relationship Between MTAI and MMPI Scores 


Results 


1. The results of the analysis of the T 
scores with K corrections are presented in 
Table 1. Differences significant at the 1 per 
cent level between low-scoring and high-scor- 
ing females were observed on the K, D, Mf, 
and Si scales, and between high- and low- 
scoring males on the K scale only. Teachers 
scoring high on the MTAT tend to have ex- 
tremely high K scores, while those scoring 
low tend to have K scores near the mean of 
normal adults. Female teachers scoring low 
on the MTATIT tend to score higher on the D, 
Mf, and Si scales than those who score high 
on the MTATI. 

2. Table 2 summarizes the results of the 
analyses of variance of the raw scores on 
subtle, obvious, and total keys. Differences 
significant at the 1 per cent level between 
teachers scoring high and low on the MTAIT 
were found for all scales except three—Pd, 
Ma, and MaS. Significant interaction be- 


125 


tween sex and rapport was found for the 
PaS key only. 

Table 3 gives the mean raw scores for the 
keys analyzed in Table 2. 

Of the five keys with the K correction (Hs, 
Pd, Pt, Sc, and Ma), none of which showed 
significant differences in Table 1, three show 
significant differences when raw scores are 
analyzed—Hs, Pt, and Sc—with the low- 
rapport teachers higher than the high-rapport 
teachers in each case. The raw score analy- 
sis also shows significant differences for both 
males and females on the Hy scale with the 
high-rapport teachers higher. 

Within each of the five scales in which 
subtle and obvious items are differentiated 
(D, Hy, Pd, Pa, and Ma), high-rapport 
teachers are uniformly high on subtle items, 
and low-rapport teachers are uniformly high 
on the obvious items, the differences being 
significant for every such key except MaS. 

Tables for the T transformation of scores 


Table 3 


Mean Raw Scores on Certain MMPI Scales of 212 Minnesota Teachers, Classified by Sex 
and by Level of Pupil-Teacher Rapport as Measured by the MTAI 





Males 


High 
Rapport Rapport 
(N=26) (N=19) 


Low 


Scale 


Difference Between 
Low and High 
Rapport Groups 


Females 
Low High 
Rapport Rapport Wire 





Hypochondriasis 5.4 1.8 
Symptomatic depression 18.2 18.4 
Subtle 10.8 13.4 
Obvious 7.4 5.0 
Hysteria 18.1 19.8 
Subtle 13.2 17.6 
Obvious 4.9 re 
Psychopathic deviation 13.6 
Subtle 10.3 
Obvious $. KE 
Paranoia : 9.5 
Subtle S. 8.9 
Obvious : 0.6 
Psychasthenia , 4,7 
Schizophrenia : 4.9 
Hypomania Sei 13.3 
Subtle , 9.2 
Obvious , 4.1 


(N=74) (N=93) Males 


Females 


—1.5* 
1.7" 
41.0" 


—? O* 


5.4 3: — 3.6* 
20.4 +0.2 
12.2 3. +-2.6* 

8.2 5.3 —2.4* 
20.6 +1.7* 
15.5 3. +4.4* 

5.1 $.4 —2.7* 
13.8 —0.2 

9.0 bd +1.9 

4.8 , —2.1* 

9.3 , +1.3 

6.7 re +3.2** 

2.6 : — 1.9* 
fiz 6.9 —6.9* 

7.7 5.6 —5.2* 
12.8 12.3 —2.0 

8.1 8.7 0.0 

4.7 3.6 


+1.5* 
+3.3* 
—1.8* 
—0.7 
+1.3 
—2.0* 
—0.3 
+1.2** 
— 1.5* 
—4,3* 
—2.1* 
—0.5 
+-0.6 
—1.1* 


— 2.0* 





* Difference between high-rapport and low-rapport groups is significant at the 1% level and independent of sex. 
** Difference between high-rapport and low-rapport groups is significant at the 1% level for both sexes although there is an 


interaction with sex. 





126 


on the subtle and obvious keys were avail- 
able for males. TJ scores on these keys and, 
for comparison, on all of the other keys 
(without K corrections) for the 45 males in 
our sample are shown in Table 4. In the 
column of 7 scores for males with low MTAI 
scores all of the scores except M/f lie near the 
norm of 50; in the column of 7 scores for 
males with high MTATI scores there are three 
entries greater than 60, all on the subtle— 
DS, HyS, and PaS; and there is one T score 
of 57.2, also on a subtle key—PdsS. 

3. Table 5 shows the results of sorting the 
209 “valid” profiles (i.e., profiles with K be- 
low 70) on the basis of the highest score on 
each. A significantly greater proportion of 
the teachers scoring high than of teachers 
scoring low on the MTAT had their highest 


Table 4 


Mean T Scores of 45 Male Minnesota Teachers on Cer- 
tain Scales of the MMPI, Classified According 
to Level of Pupil-Teacher Rapport 
as Measured by the MTAT 





Low High 
Rapport Rapport Differ- Signifi- 
Scale cant 


K . : +140 Yes 
Hypochondriasis — 82 Yes 
Symptomatic 
depression 
Subtle 
Obvious 





+05 Yes* 
+10.3 Yes 
— 7.2 Yes 
Hysteria +27 Yes 
Subtle +10.2 Yes 
Obvious aA: : — 7.1 Yes 
Psychopathic 
deviations No 
Subtle 49.7 57.2 E Yes 
Obvious 50.3 43.8 Yes 
Masculinity- 
femininity 
Paranoia 
Subtle 
Obvious 
Psychasthenia 52.6 
Schizophrenia 51.2 
Hypomania 52.7 
Subtle 51.0 
Obvious 52.4 
50.6 


60.6 
50.8 
48.7 


51.7 


62.6 No 
54.6 No 
63.6 Yes 
42.2 Yes 
42.7 Yes 
43.9 7.3 Yes 
48.5 - 42 No 
50.6 04 No 
44.4 8.0 Yes 
44.6 6.0 No 


Social introversion 





* The conclusion that there is a significant difference on this 
scale is based on analysis of males and females both, with sex 
differences partialed out; the difference is opposite in direction 
to that indicated here. 


Walter W. Cook and Donald M. Medley 


Table 5 


Relative Frequency of Codings of MMPI Profiles of 209 
Teachers Classified According to Level of 
Pupil-Teacher Rapport as Meas- 

ured by the MTAI 








Low 
Rapport 
Key (N=99) 
Uncoded 4.0 
Hs 6.1* 
D 8.1* 
Hy 2a" 
Pd 8.1* 
Mf 15.2 
Pa 10.1 
Pt 2.0 
Sc 4.0 
Ma 13.1 
Si ah” 


High 
Rapport 





Total 100.0 





* Significant at the 5% level. 
** Significant at the 1% level. 


scores on Hy. In addition, high-rapport 
teachers tended to have their highest scores 
on Pd more often than low-rapport teachers, 
while low-rapport teachers tended to be high- 
est more frequently on Hs, D, and Si. 


Discussion 


The results of any study of relationships 
between scores on different self-rating in- 
ventories must always be interpreted with 
caution because of the possibility that set 
factors present in both inventories may ex- 
aggerate the relationships between the person- 
ality traits measured by the two inventories. 
If one neurotic inventory were used as a cri- 
terion in validating another, it is highly prob- 
able that biasing set factors in both tests 
would correlate highly with each other and 
give a spuriously high validity coefficient. 

In the present investigation, the K scale 
for the MMPI affords a measure of what is 
probably the most important of such factors. 
Meehl (6) describes the K scale as measur-- 
ing a generalized attitude toward self-rating 
inventories which differentiates individuals in- 
clined to get unduly “normal” scores—to 
mark items in a socially acceptable way 
more often than the average person does— 





The Relationship Between MTAI and MMPI Scores 


from individuals inclined to get unduly “ab- 
normal” scores—to mark items in a way that 
shows them in an unfavorable light. This 
inclination may result from a conscious or 
unconscious attempt by the individual to ap- 
pear better or worse than he is, from more 
or less consistent tendencies to interpret state- 
ments on the inventory in atypical ways, or 
from some other idiosyncrasy independent of 
the dimension of personality the inventory is 
designed to measure. 

That such a factor operates in the MTAIT 
as well as in the MMPI is clearly demon- 
strated by the results of this study. The 
highly significant difference between high and 
low teachers’ mean K scores (see Table 1) is 
the most obvious evidence. The fact that the 
mean T score on this scale of teachers scor- 
ing high (in the socially acceptable direction) 
on the MTAT is above 60 for both sexes indi- 
cates that these teachers had an attitude or 
set toward these inventories that caused them 
to mark their items in general—regardess of 


Table 6 


Comparison of T Scores on the MMPI With and 
Without the K Correction of Male Teachers 
Scoring High and Low on the MTAI 


High 
Rapport 
(N = 26) 


Low 
Rapport 
Scale 
With K 
Without K 


Difference 


Hypochondriasis 


With K 
Without K 
Difference 


Psychopathic 
deviation 


With K 
Without K 
Difference 


Psychasthenia 


With K 
Without K 
Difference 


Schizophrenia 


With K 
Without K 
Difference 


Hypomania 


With K 
Without K 
Difference 





127 


content—in the “normal” direction much 
more consistently than the typical member 
of the standardization group did; whereas 
the scores of the teachers scoring low on the 
MTAIT indicate that these teachers replied to 
the items in essentially the same way as do 
typical members of the standardization group. 

The import of the difference between the 
groups on the K scale may clearly be seen in 
Table 6, which contrasts the 7 scores with 
and without the K correction. When a K 
correction is applied to the MMPI scores of 
the teachers scoring low on the MTAT, the 
average change in the mean 7 score is an in- 
crease of 0.1 point; when applied to the 
MMPI scores of the teachers scoring high on 
the MTAI, the K correction produces an av- 
erage increase of 8 T-score points. The ef- 
fect of this factor is to enable the high-scor- 
ing teachers (MTAI) to hold their MMPI 
scores down well below normal, at a mean of 
45.5, when K is not used; when the effect of 
the factor is removed, their scores have a 
mean of 53.6. These teachers manage to 
appear ‘“‘supernormal” on scales on which 
they are actually somewhat elevated. 

The same thing is shown in a different way 
in Table 4, where the mean 7 scores for the 
male teachers on subtle and obvious items 
from the same scales scored separately are 
shown. There is a highly consistent tend- 
ency for teachers scoring high on the MTAT 
to score higher on the subtle than on the ob- 
vious items regardless of what they are sup- 
posed to measure. On the average, the high- 
rapport teacher scores 15.2 T-score points 
higher on the subtle key than on the obvious 
one on the same scale—indicating a high de- 
gree of skill in choosing socially acceptable 
responses, regardless of their import with re- 
gard to personality traits. The low-rapport 
teacher, on the other hand, shows an average 
difference between scores on subtle and ob- 
vious keys for the same scale of only about 1 
T-score point, and is actually slightly higher 
on the obvious keys than on the subtle ones. 

Concluding that this set factor is discrimi- 
nating high- and low-rapport groups indi- 
cates either that this is a contamination in 
the MTAI when used as a criterion of 
teacher-pupil rapport, and we are actually 





128 


studying two groups of teachers with differ- 
ent attitudes toward personality tests in- 
stead of with different attitudes toward their 
pupils, or that this difference in set is a real 
difference between the two types of teachers, 
as important as any other. 

We cannot dismiss the first alternative, but 
the second also has some plausibility. We 
may reason thus: the items on the MTAI are 
in the main obvious items in the sense that 
the direction of “correct” response is plain 
to anyone no less sophisticated psychologi- 
cally than the average classroom teacher. 
The teacher whose score is low on this in- 
ventory must be aware that he is answering 
many items in a “wrong” way. The fact 
that he does so could indicate either an 
“Everyone is wrong but me” attitude, or the 
generalized tendency to describe himself un- 
favorably measured by the K scale. The 
evidence in Table 1 indicates that the latter 
is not the case; it is not unreasonable to con- 
clude that this attitude is more common in 
the teacher who does not get along too well 
with pupils. 

Additional support for the second alterna- 
tive is gained through an examination of the 
items in the K scale. If one may assume 
that a teacher who has high rapport with 
pupils is likely to have friendly relations 
with others, is likely to have high confidence 
in the honesty, sincerity, and improvability 
of others, and has a modest evaluation of his 
own virtues and achievements, then it is 
plain why he should obtain a high K score. 
Some of the items are given below with a T 
or an F to indicate the direction of the an- 
swer to increase the K score. 


I have very few quarrels with members of my 
family. (T) 

It makes me impatient to have people ask my ad- 
vice or otherwise interrupt me when I am working 
on something important. (F) 

It takes a lot of argument to convince most people 
of the truth. (F) 

Most people will use somewhat unfair means to 
gain profit or an advantage rather than to lose it. 
(F) 

People often disappoint me. (F) 

I get mad easily and then get over it soon. (F) 

I have often met people who were supposed to 
experts who were no better than I. (F) 


Walter W. Cook and Donald M. Medley 


I think a great many people exaggerate their mis- 
fortunes in order to gain sympathy and help of 
others. (F) 


What others think of me does not bother me. (F) 
I am against giving money to beggars. (F) 


In either case, there can be no question 
of the validity of the MTAI, which was de- 
veloped and evaluated by strictly empirical 
methods. Teachers with high MTATI scores 
do have better rapport on the average than 
teachers with low MTAI scores. It is there- 
fore worthwhile looking at differences not ex- 
plainable as due to the K factor. 

The other results of the first step (Table 
1) may be interpreted as they stand. The 
significant differences on the D, Mf, and Si 
scales show that (at least among female 
teachers) low-rapport teachers tend to be 
more depressed, less feminine, and more so- 
cially introverted than teachers with high 
teacher-pupil rapport. 

If we are free to conclude that the subtle 
keys are unaffected by the K factor, we also 
notice that there is evidence that MTAI 
scores are related to the traits measured by 
the HyS, PdS, and PaS scales. Teachers 
scoring high on the MTAT have significantly 
higher scores on all three of these scales (see 


Table 3), particularly on the PaS scale. If, 
since there is no interaction with sex, the T 
scores in Table 4 for HyS and PdS are taken 
as roughly representative of both sexes, they 
indicate that the low-rapport teachers are 
about normal, and the high-rapport ones 


markedly elevated on these keys. There is 
evidence of a differential effect of sex on the 
PaS scale, the difference being greater for 
males than for females. Differences on these 
subtle keys agree fairly well with those on 
corresponding corrected total scores in Table 
1, although the latter differences fall short of 
significance. 

The results of the third part of the analy- 
sis (see Table 5) also agree with these. If 
a teacher’s MMPI profile is highest on Hs, D, 
or Si, he is likely to have a low MTAI score; 
if it is highest on Hy or Pd, he is likely to 
have a high MTAT score. 


Summary and Conclusions 


The Minnesota Multiphasic Personality In- 
ventory was administered to two groups of 





The Relationship Between 


classroom teachers at opposite extremes of 
the distribution of scores of the standardiza- 
tion sample for the Minnesota Teacher Atti- 
tude Inventory. Differences in mean scores 
of the two groups on the clinical keys for the 
MMPI, with and without the K correction 
and on “subtle” and “obvious” items scored 
separately, and differences in frequencies of 
different codings of profiles were analyzed. 
The following conclusions were reached: 

1. Teachers scoring high on the MTAI 
tend to have high elevations on the F scale, 
but teachérs scoring low on the MTATI do not. 

2. Teachers scoring high on the MTAI 
tend to score higher than teachers scoring 
low on the MTAIT on subtle items of the Hy, 
Pd, and Pa scales. 

3. Teachers scoring low on the MTAI tend 
to score higher on the D key and on obvious 
items in general than do teachers scoring high 
on the MTAT. 

4. The highest MMPI score of a teacher 
scoring high on the MTAT is likely to be on 
the Hy or Pd scale; the highest MMPI score 
of a teacher scoring low on the MTAT is 
likely to be on the Hs, D, or Si scale. 

5. Because of the prominent role of the 
set factor measured by the K scale, no con- 
clusions are justified regarding personality 
differences between teachers who have high 
rapport with their pupils and teachers who 


MTAI and MMPI Scores 129 


do not have high rapport with their pupils on 
the bases of the results of this study. 


Received April 30, 1954. 


References 


. Cook, W. W., & Hoyt, C. J. Procedure for de- 
termining number and nature of norm groups 
for the Minnesota Teacher Attitude Inven- 
tory. Educ. psychol. Measmt, 1952, 12, 562- 
573. 

. Cook, W. W., Leeds, C. H., & Callis, R. Manual 
for the Minnesota Teacher Attitude Inventory. 
New York: The Psychological Corporation, 
1951. 

. Cook, W. W., & Medley, D. M. Proposed hos- 
tility and pharisaic-virtue scales for the MMPI 
designed to measure ability to work har- 
moniously with people. J. appl. Psychol., 
1954, 38, 414-418. 

. Hathaway, S. R. A coding system for MMPI 
profile classification. J. consult. Psychol., 
1947, 11, 334-337. 

. Hathaway, S. R., & McKinley, J. C. The Min- 
nesota Multiphasic Personality Inventory. 
Minneapolis: The Univer. of Minnesota Press, 
1943. 

. Meehl, P. E., & Hathaway, S. R. The K factor 
as a suppressor variable in the Minnesota 
Multiphasic Personality Inventory. J. appl. 
Psychol., 1946, 30, 525-564. 

. Wiener, D. M. Subtle and obvious keys for the 
MMPI. J. consult. Psychol., 1948, 12, 164— 
170. 

. Supplementary Manual for the Minnesota Multi- 
phasic Personality Inventory. New York: 
Psychological Corporation, 1946. 





The Journal of Applied Psychology 
Vol. 39, No. 2, 1955 


Development of General Working Population Norms for 
the USES General Aptitude Test Battery * 


Albert Mapou 


U. S. Employment Service, U. S. Department of Labor 


The General Aptitude Test Battery 
(GATB), B-1001, of the United States Em- 
ployment Service (USES) consists of fifteen 
tests which measure ten aptitudes (1). Gen- 
eral working population norms were needed 
to establish standard scores on the same base 
population for all the tests in the battery. 
This permits the evaluation of an individ- 
ual’s test scores on a standard battery, used 
for both research and operating purposes, in 
terms of the same general population norma- 
tive group. Since the GATB is used na- 
tionally for a wide range of occupations, it is 
desirable that general population norms for 
this battery be based on a sample which rep- 
resents, as closely as permitted by available 
data, that portion of the general working 
population of the United States for which 
aptitude tests are most commenly used. The 
initial general working population norms for 
the GATB were based on a sample composed 
of the first 519 workers tested with this bat- 
tery. It was recognized that this sample 
probably was not truly representative of the 
general working population; therefore, the 
present study was undertaken to develop 
norms on a larger and more representative 
sample. 


Population 


The base population for the GATB gen- 
eral working population norms study is the 
employed labor force in the age range of 18 
through 54 years as recorded in the 1940 
Census of the Population (3). Since the 
1950 Census report was not available at the 
time this study was conducted, it was not 
possible to use 1950 Census data as originally 
planned. In order to estimate how well the 
1940 data reflect current conditions, reference 
was made to Bureau of the Census statistical 


1 Paper read at Annual Meeting of American Psy- 
chological Association, Cleveland, Ohio, September 
1953. 


adjustments of the 1940 Census data, based 
on a sample from the 1950 Census (5). The 
Census adjustments did not prove to be ex- 
tensive. The proportions of workers in vari- 
ous occupational groups remained substan- 
tially unchanged; differences were found pri- 
marily in the geographic distribution of 
workers. 

The employed labor force, as recorded in the 
1940 Census was 45.2 million (45,166,083). 
For the purpose of this study certain dele- 
tions were made that resulted in an ab- 
stracted base population of 24.2 million 
(24,219,021). (a) The base population was 
restricted to those employed workers in the 
age range of 18 through 54 years. (5) The 
base population was further curtailed by 
eliminating all farmers, farm laborers, farm 
managers and foremen; all proprietors, man- 
agers, and officials; and all service workers. 
This latter deletion of specified occupational 
groups was made partly because tests are 
generally not in use in those occupational 
areas, and partly because of the difficulty in 
collecting data for those occupations. 


Sample 


In order to establish a set of norms for the 
working population as defined in the 1940 
Census of the United States, a representative 
sample of 4,000 cases was selected from more 
than 8,000 cases for which appropriate re- 
search data were available. These 4,000 
cases selected for the standard sample have 
been designated as the GATB General Work- 
ing Population Sample. The mean age of the 
standard sample is 30.4 years with a stand- 
ard deviation of 9.9, and the mean education 
is 11.0 years with a standard deviation of 
2.6. According to Census data the median 
educational level for the general working 
population is 10.2 years for males and 11.7 
years for females (4). 


130 





General Working Population Norms 


Table 1 


Number and Percentage of the GATB General Working 
Population Sample and Percentage of the 1940 
Employed Working Population in 
Each Geographic Region 


Percent- 
age of 
GATB 
General 

Working 

Population 

Sample 


Percent- 
age of 
Employed 
Working 
Population 
(1940 


Census) 


N of 
GATB 
General 
Working 

Population 
Sample 


Region* 
North Eastern 
North Central 
South 
West 


29.06 
30.17 
30.17 
10.60 


28.22 1,129 
44.65 1,786 
21.40 856 


5.73 229 


Total 100.00 100.00 4,000 


* The regions are those employed by the Census Bureau (3). 


Sampling Design 


The stratified quota method of sampling 
was employed insofar as practicable. This 
method consists of stratifying the sample to 
make it proportionally representative of the 
base population with respect to selected con- 
trol factors. Occupation, sex, age, and geo- 
graphic location were considered in selecting 
the sample. Since it was not possible, with 
the amount of data available, to achieve the 
desirable stratification with respect to all 
factors considered, occupation was established 
as the primary control factor. 

Occupation. It was necessary to base oc- 
cupational stratification on the Bureau of the 
Census occupational classification system be- 
cause the population data were compiled ac- 
cording to this system. Within each selected 
major occupational group apportionment was 
made to assure a broad representation of spe- 
cific occupations. 

Sex. Consideration was given to having 
the male-female ratio in the sample reflect 
the proportions found in each occupational 
group. Since the male-female ratio in a 
given occupational area tends to vary with 
economic conditions, use of the 1940 Census 
data as a basis for determining the male-fe- 
male ratio of the sample was questionable. 
The belief that use of a disproportionate 
number of men and women in the GATB 
standard sample would reduce the value of 


131 


the norms for use with groups composed of 
mostly men or mostly women led to the de- 
cision to compose half the sample of males 
and half of females. This ratio was approxi- 
mated rather than obtained exactly in order 
to achieve proper occupational representation 
with available data. 

Age. Since the GATB is seldom used for 
counseling individuals outside the age range 
of 18 through 54 years, and the preponder- 
ance of research data available was for work- 
ers in this age range, it was decided to repre- 
sent that portion of the general working 
population which fell within this age range. 
Therefore, only workers 18 through 54 years 
of age were considered when the occupational 
composition of the base population was de- 
termined. However, a_ negligible number 
(approximately 3!% per cent) of individuals 
outside this age range were included in the 
GATB standard sample of 4,000, in order to 
include occupations which would not have 
been represented otherwise. 

Geographic distribution. Table 1 shows 
the geographic distribution, by region, of the 


Table 2 


Number (Total, Male, Female) and Percentage of the 
GATB General Working Population Sample 
in Selected Occupational Groups 


Number 

Occupational : 

Group* Per Cent Total Male Female 
Professional and 

Semiprofessional 

Workers 

(D.0.T. 0) 
Clerical, Sales, and 

Kindred Workers 

(D.O.T. 1) 
Craftsmen, Fore- 

men, and Kindred 

Workers 

(D.0O.T. 4and 5) 7.33 694 50 
Operatives and 

Kindred Workers 

(D.O.T. 6 and 7) 
Laborers, except 

Farm and Mine 

(D.O.T. 8 and 9) 


1,238 29: 943 


10.90 436 360 


Total 100.00 4,000 1,834 2,166 


* The occupational groups are those used by the Bureau of 


the Census (3). orresponding Dictionary of Occupational 
Titles (6), or D.O.T., major occupational code groups are noted 
in parentheses. 





132 


GATB standard sample as compared with 
that of the employed working population. 
The results of a chi-square test indicated 
that the obtained differences between theo- 
retically desirable and sampled proportions 
in the four geographic areas are greater than 
chance. The data available did not enable 
adjustments to correct for the differences. 
However, it is believed that although the 
sample does not contain exact proportions of 
workers found in the various geographic 
areas, the differences are not large enough to 
cause any appreciable error in the norms. 
Furthermore, the geographic distribution of 
the employed population shifts from one eco- 
nomic period to another. It should be noted, 
however, that all parts of the country are 
represented in the standard sample. 


Composition of Sample 

The occupational composition of the GATB 
General Working Population Sample is shown 
in Table 2. The proportions of the standard 
sample in the various occupational groups are 
exactly the same as those of the base popu- 
lation. 

Table 2 shows only the broad occupational 
groups represented in the GATB standard 


Albert Mapou 


sample. However, the GATB standard sam- 
ple of 4,000 includes tested workers in a va- 
riety of specific occupations within each broad 
occupational group. Test data were used for 
workers in 19 specific occupations in the Pro- 
fessional and Semiprofessional Group; 31 spe- 
cific occupations in the Clerical and Sales 
Group; 24 specific occupations in the Crafts- 
men, Foremen, and Kindred Group; 30 spe- 
cific occupations in the Operatives and Kin- 
dred Group; and for workers in 11 specific 
occupations in the Laborer Group. 


Establishment of Norms 


The means and standard deviations of each 
GATB test were calculated for the sample of 
4,000 and these results were compared with 
the means and standard deviations obtained 
for the sample of 519 which served as a basis 
for establishing the initial GATB general 
working population norms. Statistically sig- 
nificant differences were obtained for eight of 
the fifteen GATB tests, with the sample of 
4,000 achieving higher mean scores than the 
sample of 519 on seven of the eight tests. 
Although the data indicated that eight of the 
obtained differences are “true” or stable dif- 


Table 3 


Means (M), Standard Deviations (#), and Critical Ratios (CR) of the Differences Between Means « 


Scores on B-1001, for the Two GATB General Working Population Samples of V = 4,000 and V 





Test 

A—Tool Matching 
B—Name Comparison 
C—H Markings 
D—Computation 
F—Two-Dimensional Space 
G—Speed 
H—Three-Dimensional Space 
I—Arithmetic Reason 
J—Vocabulary 

K—Mark Making 

L—Form Matching 
M—Place 

N—Turn 

O—Assemble 


P—Disassemble 


CR 


2.24" 


63 


e 
a pm 
) 


hI bh 


i) 
oe ee ee) 
€ - ° . as | + 
A em ‘ 


oan ht 


.29 
4.20** 
1.31 
1.45 
4.65** 
7.46** 
1.39 
3 20** 
1.64 
4.44** 

73 
3.26" 
4.06** 


—~ 


td 
4 
Oo 
m— wh bdo 


wn 
co © 
mm UI be 
~ wanns 
~~ ~ S . 
ann & © 


w m9 OM 
YANN 





* Significant at .05 level. 
** Significant at .01 level. 





General Working Population Norms 


ferences, none of them is very great in magni- 
tude. These results are shown in Table 3. 

Standard score distributions were estab- 
lished on the basis of the means and stand- 
ard deviations obtained on the GATB tests 
for the sample of 4,000. The raw test scores 
were adjusted so that the standard score dis- 
tribution for each aptitude measured by the 
GATB would have a general working popu- 
lation mean of 100 and a standard deviation 
of 20. 

Conclusions 


The initial GATB general working popula- 
tion norms, based on the sample of 519 work- 
ers, served as a good basis for evaluating in- 
dividual and group test results on the GATB 
during the first five years after the GATB 
was issued. Although statistically significant 
differences were found between the latest 
GATB general working population norms 
based on the sample of 4,000 and the initial 
norms based on the sample of 519, none of 
the obtained differences was very great in 
magnitude. In the present study it was pos- 
sible to achieve, in the standard sample of 
4,000 cases, exact representation of the base 


population with respect to broad occupational 


133 


groupings. This was considered to be a de- 
sirable objective since the GATB is intended 
to be used for the evaluation of occupational 
potentialities. As more data become avail- 
able, the GATB standard sample should be 
expanded to achieve better representation of 
the general working population with respect 
to all appropriate control factors. 


Received April 18, 1954. 


References 


. Dvorak, Beatrice J. The new USES general apti- 
tude test battery. J. appl. Psychol., 1947, 31, 
372-376. 

. Dvorak, Beatrice J. Pre-employment tests. 
and Factory, 1952, 50, 100-103. 

. U. S. Department of Commerce, Bureau of the 
Census. The labor force. Sixteenth census 
of the United States, 1940, Vol. III, Part I, 
Washington: U. S. Government Printing Of- 
fice, 1943. 

J. S. Department of Commerce, Bureau of the 
Census. Labor force. Current population re- 
ports, 1949, No. 14 (Series p-50). 

J. S. Department of Commerce, Bureau of the 
Census. 1950 Census of population prelimi- 
nary report, 1951, No. 2 (Series pc-7). 

. U. S. Department of Labor, Bureau of Employ- 
ment Security. Dictionary of occupational 
titles. Washington: U. S. Government Print- 
ing Office, 1949. 


Mill 





The Journal of Applied Psychology 
Vol. 39, No. 2, 1955 


Validities of Three Vocational Interest Keys for U. S. 
Navy Yeomen * 


Dallis K. Perry 


Officer Education Research Laboratory, Air Force Personnel and Training Research Center 


In view of the important part vocational in- 
terest inventories have come to play in coun- 
seling, guidance, and selection, it is somewhat 
surprising to note that little mention has been 
made in the literature of cross validation of 
interest scales. We are speaking here of va- 
lidity in terms of differentiation provided by 
a scale between a reference group and the 
criterion vocational group on which the scale 
is based. Cross validity, then, is in terms of 
differentiation between the same reference 
group and a new group from the same vo- 
cation as the criterion group. Strong (3, pp. 
637-650) reports research which led him to 
recommend using all available cases in the 
criterion group and basing norms on the 
same cases rather than building a scale on 
half the available cases and basing norms on 
the other half. Since cross validation is simi- 
lar to the latter procedure, the same research 
results probably explain the lack of cross 
validation of Strong’s scales. Clark (2) used 
cross validation as one criterion in research 
on methods of selecting items for interest 
keys. Cross validities for keys constructed 
by all his methods were quite satisfactory. 
Although the study included only two vo- 
cational groups, the results are encouraging 
indications of the stability of interest keys 
developed by Clark’s methods. 

In a study of certain methodological prob- 
lems of interest measurement there appeared 
an opportunity to investigate validity of 
three vocational interest keys. Two voca- 
tional interest inventories, the Strong Voca- 
tional Interest Blank (SVIB) and the Min- 
nesota Vocational Interest Inventory (MVII), 


1 This study was completed while the author was 
Research Fellow in the Industrial Relations Center, 
University of Minnesota. It was a portion of a dis- 
sertation submitted to the faculty of the University 
of Minnesota in partial fulfillment of requirements 
for the degree of Doctor of Philosophy. The author 
wishes to express his appreciation to his advisors, 
Professors D. G. Paterson and K. E. Clark, for their 
invaluable assistance in all phases of the research. 


were administered to a number of Navy yeo- 
men. There is a Yeoman key for the MVII, 
but none for the SVIB. The SVIB does, 
however, have an Office Worker key, and 
yeomen are clerical and office workers in the 
Navy. There is also an MVII key for the 
related civilian occupation of shipping and 
stock clerk. Consequently, it was possible 
directly to cross validate the Yeoman key and 
to determine how well the validities of the 
Office Worker and Shipping-Stock Clerk keys 
generalize to yeomen. 


Procedure 


The SVIB and MVII were administered along 
with another interest inventory to 163 yeomen at- 
tending Yeoman Schools at two U. S. Naval Train- 
ing Centers.2. Results are based on 135 sets of usable 
inventories. A face sheet asked the yeomen if they 
had done civilian work similar to that which they 
were doing as yeomen. Of the 135, 38, or 28 per 
cent, marked “yes,” and 97, or 72 per cent, marked 
“no.” They were also asked if they would choose 
the same career if they could start over again at 
age 18. A total of 85, or 63 per cent, marked “yes,” 
while 50, or 37 per cent, marked “no.” Other char- 
acteristics of this yeoman group (validation yeo- 
men) and of the criterion group for each of the keys 
investigated are shown in Table 1. The similarity 
between the two yeoman groups on all character- 
istics except grade is noteworthy. The yeomen dif- 
fer from the other two groups mostly in age. How- 
ever, 91 per cent of the shipping and stock clerks 
responded “no” to the question, “Would you choose 
the same occupation if starting over again at age 
18 ?” 

The distribution of scores made by the validation 
yeomen on each of the three interest keys—Yeoman, 
Shipping-Stock Clerk, and Office Worker—was com- 
pared in each case with the distribution of scores 
made by the criterion and reference groups on which 
the key was built. In addition, the scores of those 
yeomen who indicated occupational satisfaction by 
saying they would again choose the same career 


2 The samples of yeomen were obtained under the 
auspices of the Chief of Naval Personnel and through 
the cooperation of the Commanding Officers, U. S. 
Naval Training Centers, Norfolk, Virginia, and San 


Diego, California. The cooperation of both these 
officers and the yeomen is sincerely appreciated. 


34 





Vocational Interest Keys for Navy Yeomen 


Table 1 


Personal Characteristics of Validation Yeoman Group 
and of Criterion Groups for Keys Investigated 


Group 


Shipping 

Valida- Crite- and 
tion rion Stock 
Yeomen Yeomen Clerks 


Office 
Workers 
33.2 
11.5 


Characteristic 
Average age Zo.1° 
12.4* 


25.9* 39.8* 


Average education 10.8* 
Years in occupa- 

tion (rating) . 
Pay grade (NV) 

Chief Yeoman 

Yeoman 1/c 

Yeoman 2/c 

Yeoman 3/c 

Seaman 1/c 


3 or more 


* Median. 
+ Data not available. 


were compared with scores of “dissatisfied” yeomen 
on all keys and with criterion and reference group 
scores on the Office Worker key. Comparisons were 
made in terms of percentage of overlapping and of 
mean difference. 

A vocational interest key may be considered 
“good” if it satisfactorily differentiates its criterion 
group from its reference group. Since a key con- 
structed only from items which receive significantly 
different percentages of responses from two groups 
will almost certainly show a statistically significant 
mean difference between those two groups, a measure 
of practical significance is needed as well. Percent- 
age of overlapping, as defined by Tilton (4), is such 
a measure and one which is commonly used in in- 
terest measurement. A key with satisfactory va- 
lidity will have relatively little overlapping of cri- 


Table 2 
Means and Standard Deviations of Scores on Three 
Vocational Interest Keys of Validation Yeomen 
and of Criterion and Reference Groups 


Key 


Mean SD 
24.2 41.2 


23.4 35.6 
41.2 


Group 


Criterion yeomen 


Yeoman Validation yeomen 


Navy men-in-general 

Shipping- 
Stock 
Clerk 


Shipping-stock clerks a0 697 
10.6 


Validation yeomen ' 
Tradesmen-in-general “. 8.9 


34.2 
35.6 
41.4 


Office workers 


Office 
Worker 


Validation yeomen 
Men-in-general 


135 


terion and reference group scores. To have satis- 
factory cross validity or generalized validity a key 
should show little more overlapping of the valida- 
tion and reference groups than of the criterion and 
reference groups. In addition there should be little 
difference in means and nearly 100 per cent over- 
lapping of the criterion and validation groups. 


Results and Discussion 


In Table 2 are given for each key the N’s, 
means, and standard deviations of the valida- 
tion yeoman group and of the criterion and 
reference groups for the key. Table 3 shows 
how each pair of the three groups is differ- 
entiated by each key. Percentages of over- 
lapping and mean differences are both shown. 

Yeoman key. It can be seen that the two 
yeoman groups score almost exactly alike on 
the Yeoman key. With respect to each other 
and in comparison with the Navy men-in- 
general group, differences between them are 
extremely slight. The Yeoman key cross 
validates as well as could be desired even 


Table 3 


Mean Differences and Percentages of Overlapping on 
Three Vocational Interest Keys of Valida 
tion Yeomen, Criterion Groups, 
and Reference Groups 
Percent- 
age Mean 
Over- Differ 


Groups Compared lapping ence 


Navy men-in-general vs. 
criterion yeomen 51.0* 
. Navy men-in-general vs. 
Yeoman we an a6 
validation yeomen 50.2 
Validation yeomen vs. 


criterion yeomen 2 0.8 


Tradesmen-in-general vs. 
shipping-stock clerks 


Shipping- é 
pping T'radesmen-in-general vs. 


Stock 


‘ validation yeomen 
Clerk . 


Validation yeomen vs. 
shipping-stock clerks 


Men-in-general vs. 
office workers 57.8* 

Office 
Worker 


Men-in-general vs. 
validation yeomen 39.6* 

Validation yeomen vs. 

18.2* 


office workers 


* Significant at the 1 per cent level. 





136 


though it was based on a criterion group of 
only 102 cases. Although satisfactory cross 
validation of one, two, or several interest 
keys does not prove that all existing or fu- 
ture keys are valid, the results are quite en- 
couraging. Furthermore, most interest keys 
are based on many more than 102 cases, and 
Strong (3) reports that keys based on sam- 
ples of 200 or 300 or more are more stable 
than keys based on smaller samples. 

Shipping-Stock Clerk key. Table 3 shows 
that the Shipping-Stock Clerk key differenti- 
ates yeomen from the reference group almost 
as well as it differentiates the criterion and 
reference groups. The Shipping-Stock Clerk 
key appears to be a valid key for yeomen. 
Despite differences between the Shipping- 
Stock Clerk and Yeoman keys, shown by 
large differences in means and standard devia- 
tions of the yeomen on the two keys, yeomen 
and shipping and stock clerks get similar 
scores on the Shipping-Stock Clerk key. This 
fact suggests a core of interests common to 
members of the two occupations. The Ship- 
ping-Stock Clerk key does a better job of 
differentiating yeomen from tradesmen-in- 
general than the Yeoman key does of differ- 
entiating yeomen from Navy men-in-general. 
This does not mean that the Shipping-Stock 
Clerk key provides a better indication of yeo- 
men’s interests than the Yeoman key itself. 
Only if the same reference group were used 
in both cases could a direct comparison of 
the two keys be made. 

Office Worker key. Although the Office 
Worker key satisfactorily differentiates the 
yeomen from men-in-general, it does not do 
so as well as it differentiates office workers 
from men-in-general. The yeomen show 16 
per cent more overlap with men-in-general 
than do office workers. Also, the office work- 
ers and yeomen differ significantly in means, 
and they overlap only 79 per cent. The simi- 
larity of the interests of Navy yeomen to 
those of civilian office workers is shown by 
the high mean score of yeomen and their dif- 
ference from men-in-general on the Office 
Worker key. The percentage overlap of yeo- 
men with office workers, however, is not suffi- 
ciently large to indicate that the Office 
Worker key provides a satisfactory measure 
of the interests of yeomen. 


Dallis K. Perry 


Reasons for the failure of the validity of 
the Office Worker key to generalize com- 
pletely to yeomen cannot be determined from 
the data available. There are several pos- 
sibilities: 

1. The Office Worker key may simply be 
an unsatisfactory key. Since it is based on 
a comparison of 326 office workers with 
Strong’s large men-in-general reference group, 
this possibility seems unlikely. 

2. The Office Worker key was published in 
1938. The yeomen were tested in 1953. 
Perhaps the interests of office workers, both 
civilian and military, have changed during 
the intervening 15 years to such an extent 
that the 1938 key is no longer satisfactory. 
This possibility also seems unlikely. 

3. The Office Worker key is based on a 
group of “professional” office workers. Yeo- 
men are Navy enlisted men. Despite many 
similarities in the duties performed by the 
two groups, they might be quite dissimilar 
people simply because of factors of selection, 
placement, and survival. The groups differ in 
age and education, though these are probably 
not important factors as shown by the yeoman 
vs. shipping and stock clerk comparison. Al- 
though Clark (1) has found that vocational 
interests cut across Navy-civilian lines, dif- 
ferent kinds of people may go into Navy 
office work than go into civilian office work. 


Occupational satisfaction. Additional in- 
formation about the validities of the three 
keys was sought by obtaining the mean scores 


Table 4 


Means, Standard Deviations, and Mean Differences on 
Three Vocational Interest Keys of Occupation- 
ally Satisfied and Nonsatisfied Yeomen 


Occupational 
Satisfaction 


Mean 
Diff. 


34.0 21.7" 
36.8 


Key 
Choose same career 85 
Choose different career 50 


N Mean SD 


32.0 


Yeoman 10.3 


Shipping- Choose same career 85 7.9 86 9.4* 
Stock 11.0 
Clerk 


Choose different career 50 —1.5 


32.4 
34.2 


Office Choose same career 85 48.0 
Worker Choose different career 50 20.7 





* Significant at the 1 per cent level. 





Vocational Interest Keys for Navy Yeomen 


of two groups of the yeomen: those who 
stated that they would choose the same oc- 
cupation if starting over again at age 18 and 
those who stated that they would choose a 
different occupation if starting over again at 
age 18. Table 4 gives the means, standard 
deviations, and mean differences of these two 
groups on each of the three keys. The yeo- 
men who give evidence of occupational satis- 
faction score higher on each key. The dif- 
ferences are all large and highly significant 
statistically. 

The high mean score of the occupationally 
satisfied yeoman group on the Yeoman key is 
additional evidence of the practical validity 
of the key. The key does a better job of 
differentiating satisfied yeomen from Navy 
men-in-general than it does of differentiating 
yeomen in general from Navy men-in-gen- 
eral, despite the fact that it was constructed 
on “run-of-the-mill” yeomen. 

The scores of the occupationally satisfied 
yeomen on the Shipping-Stock Clerk key pre- 
sent a picture similar to that for the Yeoman 
key. The satisfied group obtained a mean 
score not only much higher than that of the 
dissatisfied group but also higher than the 
mean score of the criterion group for the key. 
This fact is additional evidence of the prac- 
tical validity for yeomen of the Shipping- 
Stock Clerk key. This key, too, identifies 
occupationally satisfied yeomen better than 
it identifies yeomen in general. This result 
is somewhat paradoxical in view of the ap- 
parent occupational dissatisfaction of the cri- 
terion shipping and stock clerk group. To- 
gether with the average age and time in oc- 
cupation of the shipping and stock clerks it 
suggests that perhaps the shipping and stock 
clerk criterion group consisted of persons 
with interests in higher-level clerical duties 
than their abilities, or other factors, had al- 
lowed them to obtain. 

The mean score of the occupationally satis- 
fied yeomen on the Office Worker key may be 
interpreted as evidence in support of the pos- 
sibility that yeomen in general are a different 
kind of people than civilian office workers. 
The satisfied yeomen obtained much higher 
scores than the dissatisfied yeomen on the 
Office Worker key, as they did on the other 
two keys. Differences on the Office Worker 


Table 5 


Mean Differences and Percentages of Overlapping of 
Occupationally Satisfied Yeomen with Criterion 
and Reference Groups for Office Worker Key 





Mean 
Difference 


Percentage 


Groups Compared Overlapping 





Satisfied yeomen 


vs. men-in-general 50.0 49.7** 


Satisfied yeomen 


vs. office workers 90.4 8.1* 


* Significant at the 5 per cent level. 
** Significant at the 1 per cent level. 


key between the occupationally satisfied yeo- 
men and the criterion and reference groups 
for the key are shown in Table 5. Similar 
comparisons for the Yeoman and Shipping- 
Stock Clerk keys were not made, since the 
validities of these latter keys were found to 
be satisfactory for the entire yeoman sample. 
The difference between the occupationally 
satisfied yeomen and the criterion office 
worker sample is statistically significant at 
the 5 per cent level, but the amount of over- 
lapping is 90 per cent, compared with 79 per 
cent for the total yeoman sample. The 90 
per cent overlapping of the two groups indi- 
cates a difference of little practical signifi- 
cance. Also, the amount of overlapping with 
men-in-general has decreased from 61 per 
cent for the total yeoman sample to 50 per 
cent for the occupationally satisfied group. 
Apparently the yeomen who are relatively 
satisfied with their jobs have interests quite 
similar to those of civilian office workers. 

It appears that use of any of the three keys 
investigated here would be of assistance in 
the selection and assignment of Navy per- 
sonnel as yeomen and could result in an in- 
crease in the proportion of yeomen who are 
satisfied with their jobs. 


Summary 


1. Scores of a group of 135 Navy yeomen 
on the Office Worker key of the Strong Vo- 
cational Interest Blank and on the Yeoman 
and Shipping-Stock Clerk keys of the Min- 
nesota Vocational Interest Inventory were 
analyzed to determine the validities of these 
keys for yeomen. Analysis was in terms of 





138 


mean difference and percentage of overlap- 
ping of scores of the present yeoman group 
with scores of criterion and reference groups 
for each key. Differences between yeomen 
indicating occupational satisfaction and those 
indicating dissatisfaction also were considered. 

2. Overlapping of the validation and cri- 
terion yeoman groups on the Yeoman key 
was 99 per cent, and overlappings of the two 
groups with Navy men-in-general were not 
appreciably different. It was concluded that 
the validity of the Yeoman key is high and 
extremely stable. Particularly high scores of 
yeomen indicating occupational satisfaction 
were considered to be further evidence of the 
practical validity of the Yeoman key. 

3. The validation yeoman group scored 
high on the Shipping-Stock Clerk key and 
overlapped considerably with shipping and 
stock clerks. Also, this key satisfactorily 
differentiated both the yeoman and the ship- 
ping and stock clerk groups from tradesmen- 
in-general and differentiated them to about 
the same extent. Yeomen indicating occupa- 
tional satisfaction obtained a higher mean 
score than either yeomen in general or ship- 
ping and stock clerks. It was concluded that 


the validity of the Shipping-Stock Clerk key 
generalizes to yeomen, and particularly to oc- 
cupationally satisfied yeomen. 

4. Although yeomen scored very high on 
Strong’s Office Worker key and were satis- 
factorily differentiated from men-in-general 
on the key, they overlapped only 79 per cent 


Dallis K. Perry 


with civilian office workers, and the difference 
between the differentiation of office workers 
from men-in-general and the differentiation 
of yeomen from men-in-general was consid- 
erable. It was concluded that the Office 
Worker key does not satisfactorily represent 
the interests of yeomen in general. Several 
possible reasons for this were suggested. Al- 
though none could be definitely accepted on 
the basis of the data available, evidence that 
the interests of occupationally satisfied yeo- 
men are quite satisfactorily represented by 
the Office Worker key suggested that dissimi- 
larities between yeomen in general and ci- 
vilian office workers are chiefly responsible. 

5. Finally it was concluded that the use of 
any of the three keys investigated could be of 
assistance in the selection and assignment of 
Navy personnel as yeomen. 


Received April 21, 1954. 


References 


. Clark, K. E. Differences in vocational interests 
of men in seven Navy rates. Minneapolis: 
Univer. of Minnesota, Dept. of Psychology, 
1950. (Tech. Rep. No. 4.) 

. Clark, K. E., & Gee, H. H. Selecting items for 
interest inventory keys. J. appl. Psychol., 
1954, 38, 12-17. 

3. Strong, E. K., Jr. 
and women. 
Press, 1943. 

. Tilton, J. W. The measurement of overlapping. 

J. educ. Psychol., 1937, 28, 656-662. 


Vocational interests of men 
Stanford: Stanford Univer. 





The Journal of Applied Psychology 
Vol. 39, No. 2, 1955 


Note on Another Use of the Sentence-Completion Technique 


Norman Gekoski and Eleanore S. Isard 


Temple University 


One of the problems in test and question- 
naire construction has centered around the 
collection of items. Several methods have 
been used. They include the pirating of 
items from similar questionnaires, the crea- 
tion and writing of items, the use of essay 
writing from which items are culled, the use 
of the critical incident technique, and com- 
binations of the above. 

This is to suggest the use of the sentence- 
completion technique for item collection. The 
advantages can be best illustrated by an ex- 
ample. In a recent effort at item collection 
for an attitude scale both the essay-writing 
(modified by critical incident) and the sen- 
tence-completion methods were tried. In the 
essay writing six groups, totaling 127 persons, 
were used. Each of the groups was given 
two or three different questions on which to 
write the essays. There were approximately 
320 essays. The sentence-completion method 
was used in a seventh group consisting of 39 
persons, all of whom responded to a 40-item 


The Journal of Applied Psychology 
Vol. 39, No. 2, 1955 


blank. The essays had to be read and reread 
several times in order to cull items. Obtain- 
ing items from the sentence completions was 
easier and much less time consuming. Many 
items from the essays had to be rewritten be- 
cause of ambiguities, grammatical construc- 
tion, double-barreled statements, etc. Most 
of the sentence completions, however, were 
amenable to in-tact lifting. The essay method 
yielded 124 usable items from a gross of less 
than 150, after editing and rewriting. The 
sentence-completion method yielded a gross 
of over 400 items, from which 160 were se- 
lected. There were very few items from the 
essays that did not show up in the sentence 
completions. 

The advantages of the sentence-completion 
method for item collection can be summarized 
as follows: (a) The yield is large for the ef- 
fort expended; (b) items have the desired 
orientation; (c) items are in the words of the 
population on whom they are to be used. 


A Correction of “The Inference of Accident Liability from 
the Accident Record”’ 


Alexander Mintz 
City College of New York 


An error in a formula in my recent paper, 
“The inference of accident liability from the 
accident record” (1), was pointed out to me 
by Morton S. Raff of ‘the Bureau of Public 
Roads, U. S. Dept. of Commerce, whom I 
wish to thank. The formula is the last one 
on p. 42 and occurs again on p. 44, and in- 
volves the substitution of c?*/ for (c + 1)” 
in the numerator of the fraction. The correct 
formula is 


(c+ 1)??? 
* a © 
I (p+ 7) 


a= 


c+l)A ) p+i—1 


The error occurred in the preparation of the 
manuscript; the correct formula was used in 


the computations and the verbal description 
of the formula (p. 42) is correct. 

However, while the formula used in the 
computations was being checked, a_ small 
arithmetical error affecting one set of com- 
putations was noticed. The second column 
of percentages in Table 2 should read as 72.4, 
34.4, 12.2, 3.3, 0.7, 0.2, and 0.1, instead 
of 76.6, 40.9, 15.9, 4.9, 1.1., 0.3, and 0.1, re- 
spectively. The conclusions were not mate- 
rially affected by the error. 


Reference 


1. Mintz, A. The inference of accident liability from 
the accident record. J. appl. Psychol., 1954, 
38, 41-46. 





Book Reviews 


Caplow, Theodore. The sociology of work. 
Minneapolis: Univer. of Minnesota Press, 
1954. Pp. 330. $5.00. 


Psychologists, particularly those in the in- 
dustrial, counseling, and social areas of spe- 
cialization, should be interested in this book. 
It gives a perspective which ought to be use- 
ful in both teaching and research. 

In 1946, Industrial Relations and the So- 
cial Order by Wilbert E. Moore made its ap- 
pearance; and in 1951, Industrial Sociology 
by Delbert C. Miller and William H. Form 
was published. The Sociology of Work by 
Theodore Caplow represents a third and the 
most recent contribution of the sociologist to 
industrial and occupational problems. 

Caplow treats the sociology of work pri- 
marily as “the study of those social roles 
which arise from the classification of men 
by the work they do.” The topics covered 


include: The Assignment of Work, Measure- 
ment of Occupational Status, Vertical Mo- 
bility, Other Mobilities, Occupational Insti- 
tutions, and Occupational Ideologies. 


Later 
chapters include: Sociology of the Labor 
Market, The Labor Union as an Occupational 
Association, Vocational Choice, Occupations 
of Women, Occupation and Family, and 
Working Conditions. In the appendix there 
is a 10-page description of the labor force 
and 20 pages of bibliography. 

The sociological approach gives a frame of 
reference and a certain realism about the 
phenomenon of work that adds greatly to the 
approaches in modern psychology. For ex- 
ample, the chapter on vertical mobility, which 
is one of the best in the book, presents and 
discusses four types of such mobility. This 
is a refreshing viewpoint compared with the 
conventional emphasis on promotion and up- 
grading within establishments. Likewise, the 
chapter on occupation and the family, in- 
cluding the role of the housewife, represents 
a significant contribution. 

On the other hand, one may be disap- 
pointed in the chapter on vocational choice. 
The author criticizes vocational guidance in 
terms of tools and techniques, as would a psy- 


chologist. The richness 
could provide is missing. To say that the 
scales of the Strong Vocational Interest 
Blanks are of dubious validity is to say little. 
The real issue is whether the concept of vo- 
cational interest measurement any longer has 
a place. This reviewer's hunch is that it does 
not. The author quotes at length from Ginz- 
berg e¢ al. on occupational choice, but this 
leaves much to be desired. 

The chapter on the sociology of the labor 
market is an interesting and informative one. 
The markets discussed include the bureau- 
cratic labor market as well as the industrial, 
the craftsman, and those of common labor. 

The Dictionary of Occupational Titles re- 
ceives no mention in the book although less 
used and less important occupational classifi- 
cations and information materials are de- 
scribed. This seemed like a strange omission. 

The book could have been improved by in- 
creasing the use of tables for presenting data 
found in the narrative. Only two or three 
tables appear outside the appendix and one 
of these, in the chapter on occupations for 
women, presents prewar data from the U. S. 
Employment Service out of context and with- 
out explanation. 

Many occupational terms are loosely used, 
which makes understanding difficult for per- 
sons who are accustomed to having terms de- 
fined and used consistently. 

The volume should be useful collateral 
reading for psychology students in indus- 
trial, personnel, and occupational informa- 
tion courses. 


that a_ sociologist 


Carroll L. Shartle 
The Ohio State University 


Kornhauser, A., Dubin, R., and Ross, A. M. 
(Eds.) Industrial conflict. New York: 
McGraw-Hill, 1954. Pp. 551. $6.00. 


The interdisciplinary efforts of Kornhauser 
(psychology), Dubin (sociology), and Ross 
(economics) are an attempt to fill an ob- 
vious gap in our knowledge about and under- 
standing of the dynamics of industrial con- 
flict. For the most part, the editors have 





Book Reviews 141 


done an excellent job in the selection of their 
contributors as well as in the organizational 
schema of the volume of essays. The con- 
tributing authors have each written brief ex- 
positions of their thoughts and/or research 
regarding those aspects of industrial conflict 
that have concerned them. These works have 
been grouped by the editors into five basic 
sections: (@) basic issues; (b) sources; (c) 
dealing with industrial conflict; (d) indus- 
trial conflict in other societies; and (e) pres- 
ent and future status. Unlike most com- 
pendia, the essays which make up the bulk 
of the volume apparently were written spe- 
cifically for this work and consequently tend 
to cope with pertinent issues with consider- 
ably more relevancy than is usually true of 
such volumes. 

The editors indicate that they hope to meet 
the need for a comprehensive and integrated 
work in this area, but the varying emphases 
and points of view expressed do not provide 
the reader with a single frame of reference 
with which to view, let alone interpret, phe- 
nomena relating to industrial conflict. One 
wishes that the editors had followed the pat- 
tern established in Chapter 1 of their vol- 


ume, and had written an interpretive work 
derived from such sources as the present book 


includes. This would have best met the cru- 
cial need of the present. The book they have 
produced would have considerably more value 
at a time when a number of systematically 
integrated, conceptual works are available. 

Admittedly, as the editors point out, any 
subject matter as broad and complex as “in- 
dustrial conflict” is difficult, if not impossible, 
for one scholar or one discipline to encompass. 
The editors have, to a degree, attempted to 
orient and integrate in their insightful first 
chapter and in their concluding remarks. Al- 
though this material is unfortunately brief, it 
does provide the reader with some direction. 
The reader, however, after completing the 
volume, is apt to feel that he has been con- 
fronted by many discrete points of view, 
which, although they give him insight into 
the breadth and complexity of the problem, 
leave him with the task of integrating and 
evaluating conceptualizations which may be 
beyond his professional ken. 


It seems doubtful that this volume will be 
found suitable as a text for a given course. 
Realistically, the student (and often the in- 
structor) is not apt to have sufficient back- 
ground in the various disciplines represented 
to utilize optimally the material presented. 
The volume, however, would seem to have 
considerable value as a secondary source book 
for such widely dispersed courses as labor re- 
lations, industrial sociology, and personnel 
psychology. It certainly will be a valuable 
and much used asset on the bookshelf of any 
social scientist who is concerned with indus- 
trial conflict and its implications for our so- 
ciety. 

Hjalmar Rosen 


Institute of Labor and Industrial Relations, 
University of Illinois 


Peterson, Eleanor M. Aspects of readability 
in the social studies. New York: Bureau 
of Publications, Teachers College, Colum- 
bia Univer., 1954. Pp. ix +118. $3.50. 


Scientific interest in readability has in- 
creased greatly in recent years, both in the 
development of measures and in their appli- 
cation to printed material. Peterson’s book 
(apparently based upon a doctoral disserta- 
tion, although this is not specifically stated) 
is an addition to the growing literature in this 
field. It differs from most previous work, 
however, in two ways. First, Peterson has 
attempted to avoid the better known (and 
better investigated) aspects of readability— 
vocabulary difficulty, sentence length, sen- 
tence structure, etc-——and has concentrated 
on two that are relatively less well known, 
the organization of material and its interest 
value to the reader. The importance of these 
two aspects has long been recognized, but so 
also has the difficulty encountered in trying 
to measure them objectively. The second 
difference is that Peterson has presented a 
study of the effects of varied material upon 
the tested comprehension of students. Very 
few such experimental studies have been 
made to date. 

Peterson’s procedure was to: 

1. Select two passages of about 1,000 words from 


a recent text being used in high school world his- 
tory courses. 





142 


2. Develop two modified versions of each of the 
original passages, one being better “organized” and 
the other more “interesting.” An attempt was made 
to keep the original and the modified passages at 
the same level in terms of length, number of sen- 
tences, number of “hard” words, etc. 

3. Give the passages to 99 tenth-grade students 
from three high’ schools, a particular subject receiv- 
ing the two original, the two organization, or the 
two interest passages. 

4. Measure the effect of the different experimental 
versions by the use of (a) a 62-item multiple-choice 
test, and (6) a free-response item. In addition, cer- 
tain introspective reports on relative enjoyment of 
the passages were gathered from subjects other than 
those mentioned above. 


The book provides abundant evidence of 
Peterson’s careful conduct of the study. The 
multiple-choice test used is a good example. 
Peterson first developed items to cover areas 
termed vocabulary, main ideas, details, in- 
ferences, and relationships. Realizing she 
may have biased some items through having 
written both the items and the modified pas- 
sages herself, she also had an outside test 
specialist prepare a test independently. She 
then combined the two tests in her final form. 
The statistical analyses were done with equal 
thoroughness. 


Peterson found that, the modified passages 
produced significantly greater comprehension 


than the original passages. This was found 
for both the organization and interest modifi- 
cations, and was reflected by most of the dif- 
ferent kinds of test items (vocabulary, main 
ideas, etc.). The interest versions were, in 
addition, read with greater enjoyment than 
the original versions. 

The chief questions for this reviewer con- 
cern the nature of the changes made to pro- 
vide better organization and greater interest. 
The 14 ground rules for improving organiza- 
tion and the 12 for increasing interest appear 
to be rather broadly and subjectively defined. 
Could writers, working independently, make 
specific use of them without changing con- 
tent? Should “organization” include such 
things as “repetition” and “definition of vo- 
cabulary”? Should “interest” include “sim- 
plification of the style of writing,” such as 
the reduction of an 84-word sentence to 11 


Book Reviews 


words? From the literature on human learn- 
ing and readability one might expect repeti- 
tion, definition, and simpler style in them- 
selves to produce differences such as Peter- 
son found between versions. 

These questions would probably arise from 
almost any study of contextual material, be- 
cause of the complex nature of language and 
the gross variables that must be used today. 
Only further work can answer these ques- 
tions and provide such needed measures as a 
unit of content for written material. Cer- 
tainly this pioneer study, as it stands, pro- 
vides both a stimulus for future studies and 
valuable suggestions for the practicing text- 
book writer and teacher to follow. 


George R. Klare 


Ohio University 


Dvorine, I. Dvorine 
plates. (Rev. Ed.) Baltimore: Waverly 
Press, 1953. Pp. 30. $7.50. 


In this revised edition of his plates for 
testing color vision, the author has profited 
by experience and constructive criticism. The 
number of colored plates has been reduced 
from 60 to 23. In 15 of these, the response 
is made by reading numbers; in 8, by tracing 
a path. As in the original, there is a series 
of colored discs for checking nomenclature of 
“dark tints” and “light tints.” 

The present edition, now labeled “pseudo- 
isochromatic plates,” is a big improvement 
over the original test. All the plates have 
been either validated by other researchers, 
evaluated by an expert in the field, or re- 
ceived extensive checking by the author. 

The 8 plates made up of pathways to be 
traced may be used to advantage for testing 
young children and illiterates, or to check re- 
sults obtained on the first 15 plates. 

The test plates are assembled in a loose- 
leaf ring binder. It is in handy form for 
either individual or group testing. It is likely 
that the test will receive wide use in a va- 
riety of situations. 


pseudo-isochromatic 


Miles A. Tinker 


University of Minnesota 





Book Reviews 143 


Finlay, W. W., Sartain, A. Q., and Tate, 
W.M. Human behavior in industry. New 
York: McGraw-Hill, 1954. Pp. xi + 247. 
$4.00. 

This book was written to assist executives, 
supervisors, and foremen in human relations 
problems. It covers a conglomeration of sub- 
jects including motivation, attitudes, public 
relations, wages, communication, labor rela- 
tions, American ideology, and industrial or- 
ganization. 

The book is a development of the authors’ 
notes for a one-week course in human rela- 
tions for executives. It reads as if obtained 
by tape-recording lectures. Frequently it de- 
viates from the ‘King’s English insofar as 
vocabulary and sentence structure are con- 
cerned, but this is not entirely unfavorable. 
The style of writing will appeal to the audi- 
ence to which it is directed. The authors 
have kept footnotes and references to a mini- 
mum, partly because they “did not want to 
confuse the reader.” Also, “we expect our 


readers to accept or reject what we have to 
say, not because someone else—or many peo- 
ple—have already said it, but because it fits 
the facts in the light of the reader’s experi- 


ence.” The material does appear to “make 
sense’’ and the fact that it stems from univer- 
sity professors will give it additional prestige. 
This may result in uncritical acceptance of 
the views presented. This is unfortunate 
with respect to statements like “it is no won- 
der that red heads are temperamental,” “all 
of our knowledge comes from our group ex- 
periences,” and other statements where the 
authors have gone to extremes in the use of 
picturesque language or in making broad 
statements going beyond scientific evidence. 


Psychologists will react differently to this. 
Some will condemn the authors for being un- 
scientific (though they make no claim to be 
scientific) and others will say that they have 
kept their audience in mind and have written 
effectively for it. 

Each author wrote different chapters and 
the result is decidedly spotty. The chapter 
on “Use and Misuse of Authority” misses the 
boat and could well have been omitted. The 
same is true of the chapter on “The Mid- 
Century American Scene,” which is as bad as 
the next chapter on “The Changing Scene in 
Labor Relations” is good. Some subject mat- 
ter is discussed very briefly. The subject of 
references is disposed of in a single para- 
graph of seven lines. One wonders if sub- 
jects deserving as little space as this might 
not well have been omitted entirely. Per- 
haps if some of the ineffective, and somewhat 
irrelevant, chapters had been omitted, more 
space would have been available for some of 
the briefly discussed items. 

The greatest asset of the book, and it is a 
real asset, is the authors’ skill in phrasing 
concepts in such a way as to be acceptable to 
industrial readers. This was especially evi- 
dent in a discussion on unconscious mental 
processes, and in debunking the typical view 
that the economic man is all important. 

All in all, this is another book which falls 
between the extremes of scientific accuracy 
on the one hand, and overgeneralized, popular 
presentation on the other. It is better than 
the typical book in this field. 


Clifford E. Jurgensen 


Minneapolis Gas Company 





New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should be sent to Dr. John G. Darley, 
Graduate School, University of Minnesota, Minneapolis 14, Minnesota. 


The adolescent exceptional child. Langhorne, 
Pa.: The Woods Schools, 1954. Pp. 79. 
(No author or editor.) 

Christianity and anti-Semitism. Nicolas 
Berdyaev. New York: Philosophical Li- 
brary, 1954. Pp. 58. $2.75. 

A Rorschach workbook. Lucille Hollander 
Blum, Helen H. Davidson, and Nina D. 
Fieldsteel. New York: International Uni- 
versities Press, 1954. Pp. 167. $2.00. 

Labor-management relations in Illini City. 
W. Ellison Chalmers. Vol. I. Champaign, 
Ill.: Univer. of Illinois, 1954. Pp. 809. 
$10.00. 

Labor-management relations in Illini City. 
W. Ellison Chalmers. Vol. II. Champaign, 
Ill.: Univer. of Illinois, 1954. Pp. 662. 
$7.50. 

Freedom from fear. Lester L. Coleman. 
New York: Hawthorn Books, 1954. Pp. 
285. $3.95. 

Time distortion in hypnosis. 
and Milton H. Erickson. Baltimore: Wil- 
liams & Wilkins, 1954. Pp. 191. $4.00. 

Constancy of interest factor patterns within 
the specific vocation of foreign missioner. 
Reverend Paul F. D’Arcy. Washington, 
D. C.: The Catholic Univer. of America 
Press, 1954. Pp. 54. $1.00. 

Edwards Personal Preference Schedule. 
len L. Edwards. New York: Psychological 
Corp., 1954. Pp. 17. 

Statistical methods for the behavioral sci- 
ences. Allen L. Edwards. New York: 
Rinehart & Co., 1954. Pp. 542. $6.50. 

A dictionary of pastoral psychology. Ver- 
gilius Ferm. New York: Philosophical Li- 
brary, 1955. Pp. 336. $6.00. 

Bases for industrial relations. Waldo E. 
Fisher. Pasadena: California Institute of 
Technology, 1954. Pp. 27. $1.50. 

How to be a modern leader. Lawrence K. 
Frank. New York: Stratford Press, 1954. 
Pp. 62. $1.00. 

The office of the premier in French foreign 
policy-making: an application of decision- 
making analysis. Edgar S. Furniss, Jr. 
Princeton: Princeton Univer. Press, 1954. 
Pp. 67. 


Linn F. Cooper 


Al- 


144 


Zest for work. Rexford Hersey. New York: 
Harper & Brothers, 1954. Pp. 93. 

Sentence completion. James Quinter Holsop- 
ple and Florence R. Miale. Springfield, 
Ill.: Charles C Thomas, 1954. Pp. 177. 
$5.50. 

Evaluation and education of the cerebral 
palsied child. Thomas W. Hopkins, Harry 
V. Bice, and Kathryn C. Colton. Wash- 
ington, D. C.: International Council for 
Exceptional Children, 1954. Pp. 114. 
$1.60. 

Interviewing in social research. WHerbert H. 
Hyman. Chicago: Univer. of Chicago 
Press, 1954. Pp. 415. $8.00. 

Hugh Roy Cullen. Ed Kilman and Theon 
Wright. (Eds.) New York: Prentice- 
Hall, 1954. Pp. 376. $4.00. 

Personnel tests for industry. Charles R. 
Langmuir. New York: Psychological Corp., 
1954. Pp. 8. 

The peer status of sixth and seventh grade 
children. Frances Laughlin. New York: 
Teachers College, Columbia Univer., 1954. 
Pp. 85. $2.75. 

Deprived children. 
Oxford Univer. 
$1.55. 

Clinical vs. statistical prediction. Paul E. 
Meehl. Minneapolis: Univer. of Minne- 
sota Press, 1954. Pp. 149. $3.00. 

Counseling with young people. C. Eugene 
Morris. New York: Association Press, 
1954. Pp. 144. $3.00. . 

An outline of abnormal psychology. Gardner 
Murphy and Arthur J. Bachrach. (Eds.) 
New York: The Modern Library, 1954. 
Pp. 597. $1.45. 

Training for human relations. F. J. Roeth- 
lisberger. Boston: Harvard Univer. Busi- 
ness School, 1954. Pp. 198. $2.00. 

Treasury of philosophy. Dagobert D. Runes. 
(Ed.) New York: Philosophical Library, 
1954. Pp. 1280. $15.00. 

Cultural difference and medical care. Lyle 
Saunders. New York: Russell Sage Foun- 
dation, 1954. Pp. 317. $4.50. 


Hilda Lewis. 
Press, 1954. 


New York: 
Pp. 163. 





i? 


re search 


multiple Semen 


DEVISED BY 

David Segel, Ph. D. (Stanford). Specialist in Tests and Measurements. 
U. S. Office of Education 

Evelyn Raskin, Ph. D. (Minnesota). Assistant Professor of 
Psychology, Brooklyn College 


DESIGNED TO 

provide valid and reliable information for students in Grades 7-13 
and adults in nine primary aptitude areas and four basic 

factors. The nine tests are: Test 1, Word Meaning; Test 2, 
Paragraph Meaning; Test 3, Language Usage; Test 4, Routine Clerical 
Facility; Test 5, Arithmetic Reasoning; Test 6, Arithmetic 
Computation; Test 7, Applied Science and Mechanics; Test 8, Spatial 
Relations-Two Dimensions; and Test 9. Spatial Relations— 

Three Dimensions. The four factors are: I, Verbal Comprehension; 
II, Perceptual Speed; III, Numerical Reasoning; and IV, 

Spatial Visualization. 


RESULTS HELP TO 

furnish information which will (1) help examinees understand their 
aptitudes better so as to make realistic vocational plans and 

choose the school subjects in which they will enjoy the greatest 
success, and (2) aid counselors and other school personne! in 
guiding individual students and adjusting curricular offerings 

to their need. 


Se as mee The tests were de- 
Ninety-six-poge 


tables and an exeminee’s re- 
ures of stotistical esting achie sults with 
to aid in tnter- in school sub- successful and unsuc- 
of resuits. at high cessful students in 
college levels. various subjects. 


ized te maximize 
their in 





We invite you to inspect these tests. Order your specimen set now. 
Specimen sets include one copy of each of the nine tests, 96-page 
manual, nine scoring keys. extended profile sheet. transparent profile 
sheet, class record sheet, and both answer sheets. 

The price postpaid is $1.75. 


CALIFORNIA TEST BUREAU 


Los Angeles, Calif. e New Cumberland, Penn. « Madison, Wis. e Dallas, Texas 








Portable Kit carries its own legs, furniture, dolls, and manual 


18 x 20 closed 


36 x 20 House when opened 


7 Rooms 
Kitchen 
Living Room 
Dining Room 
3 Bedrooms 


Bathroom 


Permanent Walls Furniture 


Closets 
2 Bedroom Closets 


Kitchen Closet 


Doors 
Front and Back 
Closets 


Bathroom Doll 


Send Order to: 





Dramatic 


Pray Kit 


Price $65.00 


COMPLETE WITH 
MANUAL BY 


Dr. E Ellis Graham 
Dr. Lillian E. Whitmore 


Shipping charges 
prepaid 


Furniture 


Stationary in kitchen 
and bathroom 


The recommended block 
style, additional blocks 
in kit to be 
needed, 


used as 


s 


14 soft, plastic bending 


dolls 
6 adults 1 children 


2 adolescents 2 babies 


Whitmore Drama Kits, Dr. Lillian E. Whitmore 


Psychological Services for Children 


University of Denver, Denver 10, Colorado 


























PRESENT DAY 
PSYCHOLOGY 


Edited by A. A. ROBACK 


A definitive volume of 40 original contributions embracing practically 
the whole range of psychology from the neurological basis to military and 
parapsychology, each chapter written by an expert in his field expressly for 


this work. 


FROM THE TABLE OF CONTENTS 


RECENT FINDINGS IN GENERAL NEUROLOGY 
Josepl-G. Keegan 

IssuES AND RESULTS IN SENSORY PSYCHOL- 
OGY 
P. Ratoosh 

THE STATUS OF EMOTION IN CONTEMPO- 
RARY PSYCHOLOGY 
Magda Arnold 

MIDCENTURY PsSYCHOSOMATICS 
James W. D. Hartman 

CLINICAL PSYCHOLOGY 
W. G. Eliasberg 

RECENT DEVELOPMENTS IN PSYCHOTHER- 
APY 
Emil A. Gutheil 

INDIVIDUAL PsyCHOLOGY 
Rudolf Dreikurs 

ABNORMAL PSYCHOLOGY 
Philip L. Harriman 

SOcIAL PsyCHOLOGY 
Eugene L. Gaier 

APPLIED PSYCHOLOGY 
Harold E. Butt 


INTRODUCTION TO EXPERIMENTAL PARAPSY- 
CHOLOGY 


J. B. Rhine 


SOME RECENT EXPERIMENTAL WORK _ IN 
PsYCHODIAGNOSTICS 


Werner Wolff 

PRESENT-DAY PsYCHOLOGY OF SPEECH 
Emil Froeschels 

MILESTONES IN PSYCHOANALYSIS 
Leon J. Saul and Andrew S. Watson 


PROJECTIVE TECHNIQUES IN CONTEMPO- 
RARY PsyCHOLOGY 


Leopold Bellak 
EDUCATIONAL PSYCHOLOGY 

Gordon C. Hanson 
ADOLESCENCE 

Karl C. Garrison 
HYPNOTHERAPY 

Milton V. Kline 


‘TRENDS IN STATISTICS AND PROBABILITY IN 
PsyYCHOLOGY 


Herbert Solomon 
INTEGRATIONAL PSYCHOLOGY 
Clarence Leuba 


Approx. 1,000 pages .. . .$12.00 


PHILOSOPHICAL LIBRARY, Publishers 


15 East 40th St., Desk 186 


New York 16, N.Y. 


Expedite shipment by prepayment 

















JOURNAL OF APPLIED PSYCHOLOGY 


: PRICE PER PRICE PER 
YEAR VOLUME AVAILABLE NUMBERS NUMBER VOLUME 


MAR JUN SEP DEC 


1917 
1918 
1919 
1920 
1921 
1922 
1923 
1924 
1925 
1926 


$1.50 — 
$1.50 $6.00 
$1.50 $6.00 
$1.50 $6.00 
$1.50 $6.00 
$1.50 — 
$1.50 — 
$1.50 

$1.50 


el OE eel ell oll ae | 
www) wwwww 
[> hh eS PP 


CSCOMONOAURWNe 


— 


FEB APR JUN AUG OCT DEC 


1927 11 1 
1928 12 
1929 13 
1930 14 
1931 1S 
1932 16 
1933 17 
1934 18 
1935 19 
1936 20 
1937 21 
1938 22 
1939 23 
1940 24 
1941 25 - 
1942 26 1 
1943 27 1 
1944 28 1 
1945 29 1 
1946 30 1 
1947 31 - 
1948 32 1 
1949 33 1 

1 

1 

1 

1 

1 

y 


$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 
$1.50 


8 | BELSLLSRE 


$|8s8888ssss 


I NNNNNN NNN DN DP Po 
& 
S 


| 


1 a ee ee ee ee ee ee |) 


2 
2 
2 
2 
2 
2 


SEES | 


888888 88888 | 


1950 34 
1951 35 
1952 36 
1953 37 $1.50 
1954 38 $1.50 
1955 39 By subscription $7.00, foreign $7.50 $1.50 


CwWwWwWwWwww | Wwwwww ) wWwwwwwwwwww 

PrP LP HE HPAL HLH HLH HL HLL HLH HHH Hh DP bP 
AANA nnannnnanian | AH | AUMNInIanawin un 
ADAAADAADAADAADAAAAAAAAAAAAGAAADG 


SEREESS 


So 
i) 


Volumes 1 through 10 had four numbers per year; the remaining volumes have six numbers per year. 


Postage prepaid on U.S. orders. Add $.50 per volume on foreign orders. All stock subject to prior 
sale. 


The American Psychological Association gives the following discounts on orders for any one journal: 


10% on orders of $ 50.00 and over 
20% on orders of $100.00 and over 
30% on orders of $150.00 and over 


Current subscriptions and orders for back numbers should be addressed to 


AMERICAN PSYCHOLOGICAL ASSOCIATION 
1333 Sixteenth Street N.W. 
Washington 6, D. C. 








Enthusiastic Comments on 


Dynamic and Abnormal Psychology 


W. S. Taylor, Smith College 


“It is refreshing to see emphasis on phe- 
nomena instead of psychiatric syndromes.” 
—Mortimer Garrison, Jr., University of 
Pennsylvania 


“The organization of the book brings out 
with striking clarity the principal concepts 
and units which are partly the content of 
the first basic course in psychology with 


their relationship to abnormality.”—Cora 
L. Friedline, Randolph-Macon Woman’s 
College 


“An excellent discussion of the basic psy- 
chological phenomena necessary for the 
understanding of both normal and ab- 
normal behavior.”—T. E. Hannum, Iowa 
State College 


oe American Book Compan 
a pany 


College Division, 55 Fifth Avenue, New York 3, N. Y. 
Cincinnati ¢ Chicago @ Atlanta @ Dallas © San Francisco 








REVUE DE PSYCHOLOGIE APPLIQUEE 


PUBLICATION TRIMESTRIELLE 


Directeurs : D' P. PICHOT et P. RENNES 





Cette Revue s’adresse aussi bien aux cliniciens (psychologues ou psychiatres), 
qu’aux psychotechniciens (orienteurs, psychologues de la profession). 
Deux rubriques sont orientées vers l’application : Techniques et Méthodes 


en psychologie de la profession et Techniques et Méthodes en psychologie 
clinique. Ces rubriques ont pour but d’exposer sous une forme précise et con- 
créte les techniques fondamentales, d’éclairer des points douteux, de présenter, 
méme sous forme d’aide-mémoire, les méthodes pratiques de conduite des appli- 
cations. Elles sont complétées par des Revues générales qui permettent de faire 
le point des recherches dans des domaines intéressant directement |’ application. 
Dans _ rubrique Travaux originaux prennent place des études d’ordre plus 
néra 

° Enfin les autres rubriques Chroniques et Documentation et Analyses don- 
nent, tant sur le plan technique que sur le plan professionnel, un tableau de la 
vie quotidienne en psychologie appliquée. 


Rédaction et Administration : 15, rue Henri-Heine - PARIS (XVI*) 
C. C. PARIS 5851-62 


ABONNEMENTS: 1 an, France: 1.000 francs - Etranger: 1.300 francs 
NUMERO SPECIMEN SUR DEMANDE 
































Auailalle this month . 


THE DYNAMICS OF PERSONAL ADJUSTMENT 


By GEORGE F. J. LEHNER, Ph.D., University of California (L. A.), and 
ELLA A. KUBE, Ph.D., Occidental College 


e A clinical somes and a social psychologist bring together the personal 
and social forces that operate in human lives, to show the interaction of human 
needs and ‘the environment in which they must be satisfied. 


— hasizing social learning, the ‘book shows the continuity of adjustment 

h the h the whole life cycle by following an individual through successive rela- 
ani ips with parents, teachers, friends, co-workers, marriage partner and his 
own children. 


e Special attention is paid to individual differences and the importance of accept- 
ing them in oneself and in others. 


e Development of a high level of frustration tolerance, and of an attitude that 
views problems as a challenge to new learning, is str 


e Clarity of style, absence of technical terms, clear definition of needed advanced 
concepts with apt examples enhance readability and usefulness. 


Approz. 480 pages . 54%" 28%" . April, 1955 


INDUSTRIAL PSYCHOLOGY, 3rd Edition 
By JOSEPH TIFFIN, Ph.D., Professor of Psychology, Purdue University 


A well-rounded factual text on most of the psychological procedures of known 
value to industry, analyzed and evaluated in the light of experimental research. 


Ss mood broad survey of applications reveals industrial psychology as a 

ynamic area of applied psychology, with data that stimulates the student’s 
Linking and helps coordinate other areas of behavior science with industrial 
applications. 


A WORKBOOK provides ter prdete on actual situations to illustrate techniques of ana- 
lyzing and interpreting chm abe 

Answers are available on adoption ce 

Objective tests are obtainable from the author, free on adoption. 


Text: 640 pages . 5442814" . Published 1952 
Workbook: 88 pages . 814211" . Published 1952 


e 
For approval copies unite 
ay PRENTICE-HALL, Inc. 70 FIFTH AVENUE, NEW YORK 11 NY 











